Goal: To reduce processing time significantly (if possible) by making this working code more efficient. Currently 50k row by 105 column data taking about overall 2 hours to process. Share of this piece is 95%.
This piece is a key part of my python 3.6.3 script that compares two set of list of lists element by element regardless of datatype. Spent long hrs but seems I reached my limits in here. Running in Win 10.
Sorry about lots of variables. Here is description:
Ap, Bu – list of lists. Each list within list:
a) may contain any datatype (usually String, Number, Null, Date).
b) 1st element of list within list is always unique string.
c) has equal number of elements as other lists
d) each list in Ap has corresponding list in Bu (if 1st element of a element list of Ap matches that of Bu, that’s considered there is corresponding match)
prx – is index of a list within Ap
urx – corresponding/matching index of a list within Bu, as evidenced by urx=l_urx.index (prx)
cx – is index of an element in a single list of Au ux – is a corresponding element index of an element in a matching list of Bu, as evidenced by ux = l_ux.index(cx)
rng_lenAp – is range(len(Ap))
rng_ls – is range(individual list within Ap)
To visualize (just example):
Ap = [[‘egg’, 12/12/2000, 10, NULL], [‘goog’, 23, 100, 12/12/2000]]
Bu = [[‘goog’, ‘3434’, 100, 12/12/2000], [‘egg’, 12/12/2000, 45, NULL]]
for prx in rng_lenAp: urx = l_urx.index (prx) if Ap[prx] == Bu[urx]: for cx in rng_ls: ux = l_ux.index(cx) #If not header, non-matching cells get recorded with their current value if cx!=0 and Ap[prx][cx] != Bu[urx][ux]: output[prx].append (str(Ap[prx][cx] + '^' + str(Bu[urx][ux])) #Unless it is row header or ID in column, matching cells gets 'ok' elif cx!=0 and prx!=0 and urx !=0 and Ap[prx][cx] == Bu[urx][ux]: output[prx].append ('ok' +'^' + 'ok') # Anything else gets recorded with their current value else: output[prx].append (str(Ap[prx][cx] + '^' + str(Bu[urx][ux]))
There must a way to reduce processing time drastically. Currently it is taking cell by cell comparison of 50k row by 100 column data to 50k row by 100 column data about 2 hrs. Expected under 30 min. 3.1 Ghz, 4 cpu (8196MB RAM).