Extraer nombre de columna de un valor máximo por fila de una dataframe en python

Buen dia a todos,

Tengo el siguiente dataframe y quiero extraer las columnas según el valor máximo de cada fila, como se muestra a continuación.

     E   D   C  B  A  A1 0   27  17  20  3  2   0 1   19  20  13  2  0   0 2   28  22  23  5  2   0 3   42  14  18  3  1   0 4   34  19  12  4  3   0 5   34  20  15  0  1   0 6   32  28  16  4  3   1 7   19  23  17  5  0   0 8   37  17  18  4  2   1 9   33  22  14  1  1   0 10  53  24  16  5  0   0 11  18  17  13  0  0   0 12  33  17  15  4  1   0 13  33  22  12  2  2   0 14  20  19  12  2  1   0 

Y requiero obtener la siguiente lista:

[E,D,E,E,E,E,E,D,E,E,E,E,E,E,E] 

Esta lista representa la columna donde se encuentra el valor máximo de cada fila.

[27, 20, 28, 42, 34, 34, 32, 23, 37, 33, 53, 18, 33, 33, 20] 

La lista de valores son los máximos extraídos de cada fila del dataframe.

He intentado con el siguiente código pero no me ha funcionado.

Destra =[] for i in range(15):     DFi = df[i:i+1]     Destra.append(DFi.values.max())   VEs = []     for j in range(15):     DFj = df[j:j+1]     sDF = DFj.loc[::] == Destra[j]     Vj = sDF.columns.get_values()[True]     VEs.append(Vj) VEs 

Por que cuyo resultado es:

['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'] 

Y no es lo que estoy buscando.

Gracias por su colaboración.

Saludos

Obter o valor máximo de cada linha em um dataframe agrupado pandas

Tenho um dataframe pandas com UF, Municipio, Classe_Acidente, Total.

Nesse dataframe cada Município aparece três vezes, uma para cada Classe de acidente (são 3 classes) e eu preciso obter o valor máximo para cada classe de acidente. Ou seja, tenho que percorrer o dataframe inteiro e obter o máximo de cada classe, por UF.

Fiz

dfAcidentesPorMunicipiosPorUF = dfAcidentesPorMunicipiosPorUF.groupby(['uf','municipio','classificacao_acidente'])['classificacao_acidente'].count().reset_index(name="Total") 

E retornou agrupado corretamente, mas, não consigo obter o máximo.

    uf  municipio     classificacao_acidente       Total 0   AC  ACRELANDIA    Com Vítimas Feridas           10 1   AC  ASSIS BRASIL  Sem Vítimas                   6 2   AC  BRASILEIA     Com Vítimas Fatais            5 3   AC  BRASILEIA     Com Vítimas Feridas           8 4   AC  BRASILEIA     Sem Vítimas                   2 5   AC  BUJARI        Com Vítimas Fatais            5 6   AC  BUJARI        Com Vítimas Feridas           65 7   AC  BUJARI        Sem Vítimas                   26 47  TO  PARAISO DO    Sem Vítimas                   59 47  TO  PEDRO AFONSO  Com Vítimas Feridas           4 47  TO  PEIXE         Com Vítimas Fatais            18 47  TO  PEIXE         Com Vítimas Feridas           23 47  TO  PIRAQUE       Com Vítimas Feridas           5 47  TO  PIRAQUE       Sem Vítimas                   1 47  TO  KENNEDY       Com Vítimas Fatais            6 47  TO  KENNEDY       Com Vítimas Feridas           25 47  TO  KENNEDY       Sem Vítimas                   22  

Alguma ideia de como fazer isso?

Já quebrei a cabeça, mas, não consegui.

Obrigado.

Eliminar filas duplicadas según una columna y quedarme con la suma de los valores de otra columna Dataframe Python

importaré un excel que convertiré a DataFrame y lo que necesito es eliminar las filas repetidas en la columna Código y en la columna Longitud obtener la suma de todas las filas que tenían el mismo valor.

Excel que importaré

Por tanto en este caso el resultado obtenido debería ser dos filas, una con el código sombreado en amarillo y otra con el código sombreado en violeta y en la columna Longitud obtener la suma de los 5 registros amarillos y en la siguiente la suma de los 4 registros Longitud violetas.

Un saludo y muchas gracias

How to parse RDD to Dataframe with dinamic typed

I’m trying to parse a RDD[Seq[String]] to Dataframe. ALthough it’s a Seq of Strings they could have a more specific type as Int, Boolean, Double, String an so on. For example, a line could be:

"hello", "1", "bye", "1.1" "hello1", "11", "bye1", "2.1" ... 

Another execution could have a different number of columns.

First column is going to be always a String, second an int and so on and it’s going to be always on this way. On the other hand, one execution could have seq of five elements and others execution could have 2000, so it depends of the execution. In each execution the name of type of columns is defined.

To do it, I could have something like this:

//I could have a parameter to generate the StructType dinamically. def getSchema(): StructType = {   var schemaArray = scala.collection.mutable.ArrayBuffer[StructField]()   schemaArray += StructField("col1" , IntegerType, true)   schemaArray += StructField("col2" , StringType, true)   schemaArray += StructField("col3" , DoubleType, true)   StructType(schemaArray) }  //Array of Any?? it doesn't seem the best option!! val l1: Seq[Any] = Seq(1,"2", 1.1 ) val rdd1 = sc.parallelize(Lz).map(Row.fromSeq(_))  val schema = getSchema() val df = sqlContext.createDataFrame(rdd1, schema) df.show() df.schema 

I don’t like at all to have a Seq of Any, but it’s really what I have. Another chance??

On the other hand I was thinking that I have something similar to a CSV, I could create one. With spark there is a library to read an CSV and return a dataframe where types are infered. Is it possible to call it if I have already an RDD[String]?

Como crear nueva columna Datetime en Dataframe Python a partir de otras columnas donde tengo el dia, mes y año

A partir de un Data frame rt en Pyhton con la siguiente estructura

Dataframe

quiero crear una columna nueva que sea Datetime, con los datos de la columna “Mes” , “Dia” y el año actual. Estoy tratando de hacerlo con la siguiente instrucción pero me falla:

rt['Date']=datetime.datetime(date.today().year, rt['Mes'], rt['Dia'])

Me da este error:

Traceback (most recent call last):    File "<ipython-input-28-55c5a330e1fb>", line 1, in <module>     rt['Date']=datetime.datetime(date.today().year, rt['Mes'], rt['Dia'])    File "C:\Users\Usuario\Anaconda3\lib\site-packages\pandas\core\series.py", line 118, in wrapper     "{0}".format(str(converter)))  TypeError: cannot convert the series to <class 'int'>```  

transformar itens de uma lista em colunas separadas ou estender dataframe até o final

Tenho uma classe com um elemento que é uma lista estou tentando exibir em um dataframe do pandas essa lista em uma unica linha para representar o inventario do personagem.

atribuição dos itens na lista:

if self.wealth == "rich":         self.inventory = ["dagger","nobles's clothing", "cloak","backpack","rations for a week","waterskin",                           "potion of healing","pouch for coins","personal servant","personal guard", " three saddled horses"] 

estou fazendo o dataframe dessa forma porem acaba cortando a lista por ser muito grande, gostaria de fazer de um jeito que não cortasse essa linha.

inventory = pd.DataFrame({"Inventory": [self.inventory]," ": " "}) inventory.set_index(" ", inplace=True)  display(inventory) 

Writing a pandas dataframe to a csv file and renaming on a for loop

I have a script that reads SQL db to a pandas data frame which is then concatenated together to form one dataframe on a loop. I need to write this second data frame to a csv file and rename this from a list of ID’s

I am using pd.to_csv to write the file and os.rename to change the name.

for X, df in d.iteritems():     newdf = pd.concat(d)     for X in newdf:                 export_csv = newdf.to_csv (r'/Users/uni/Desktop/corrindex+id/X.csv', index = False, header = None)                 for X in NAMES:                     os.rename ('X.csv',X) 

This is the code that concatenates the data frames together. In the third loop, NAMES = ‘rt35’ but in the future this will be a list of similar names.

I expect to get a file named rt35.csv. However I either get r.csv or X.csv and this error:

OSError: [Errno 2] No such file or directory 

The files are writing correctly, the only issue is the name.

Efficient way to get value from a dataframe and append new dataframe

I have a dataframe that have about 200 million rows. the example of dataframe is like this:

date         query 29-03-2019   SELECT * FROM table WHERE .. 30-03-2019   SELECT * FROM ... JOIN ... ON ...WHERE .. ....         .... 20-05-2019   SELECT ... 

I have a function to get table(s) name, attribute(s) name from dataframe above and append to new dataframe.

import sqlparse from sqlparse.tokens import Keyword, DML def getTableName(sql):     def getTableKey(parsed):         findFrom = False         wordKey = ['FROM','JOIN', 'LEFT JOIN', 'INNER JOIN', 'RIGHT JOIN', 'OUTER JOIN', 'FULL JOIN']         for word in parsed.tokens:             if word.is_group:                 for f in getTableKey(word):                     yield f             if findFrom:                 if isSelect(word):                     for f in getTableKey(word):                         yield f                 elif word.ttype is Keyword:                     findFrom = False                     StopIteration                 else:                     yield word             if word.ttype is Keyword and word.value.upper() in wordKey:                 findFrom = True     tableName = []     query = (sqlparse.parse(sql))     for word in query:         if word.get_type() != 'UNKNOWN':             stream  = getTableKey(word)             table   = set(list(getWord(stream)))             for item in table:                 tabl = re.sub(r'^.+?(?<=[.])','',item)                 tableName.append(tabl)     return tableName 

and the function to get attribute is just like getTableName the different is the wordKey.

function to process dataframe is like this:

import pandas as pd def getTableAttribute(dataFrame, queryCol, date):     tableName       = []     attributeName   = []     df              = pd.DataFrame()     for row in dataFrame[queryCol]:         table       = getTableName(row)         tableJoin   = getJoinTable(row)         attribute   = getAttribute(row)         #append into list         tableName.append(table+tableJoin)         attributeName.append(attribute)     df = dataFrame[[date]].copy()     df['tableName']      = tableName     df['attributeName']  = attributeName     print('Done')     return df 

The result of the function is like this:

date        tableName  attributeName 29-03-2019  tableN     attributeM 30-03-2019  tableA     attributeB ....        ...        ... 20-05-2019  tableF     attributeG  

But as this is my first try, I need an opinion about what I’ve tried, because my code runs slow with large file.

Построить график DataFrame в зависимости от даты(индекс)и столбца (столбец того же DF)

У меня есть таблица с одним столбцом и индексами в виде дат

print(graf_itog1)                  Шайбы Дата              2017-08-21      4 2017-08-23      7 2017-08-25      2 2017-08-27      4 2017-09-02      4 2017-09-04      5 2017-09-06      3 2017-09-08      7 2017-09-11      3 2017-09-13      6 2017-09-15      4 2017-09-17      5 2017-09-21      3 2017-09-23      9 2017-09-25      4 2017-09-27      3 2017-10-02      6 2017-10-04      6 2017-10-06      3 2017-10-09      4 2017-10-11      2 2017-10-13      4 2017-10-15      2 2017-10-17      5 2017-10-21      4 2017-10-23      5 2017-10-25      4 2017-10-27      5 2017-10-31      3 2017-11-02      1 2017-11-04      5 2017-11-14      2 2017-11-16      3 2017-11-18      8 2017-11-21      4 2017-11-27      9 2017-11-29      3 2017-12-02      4 2017-12-04      3 2017-12-06      3 2017-12-08      1 2017-12-19      2 2017-12-23      5 2017-12-25      4 2017-12-27      3 2017-12-29      3 2018-01-05      6 2018-01-07      5 2018-01-09      4 2018-01-11      1 2018-01-16      4 2018-01-18      7 2018-01-20      5 2018-01-22      2 2018-02-27      3 2018-03-01      1 

И при построении графика по этому DateFrame

graf_itog1.plot() plt.show() 

Выдает такую ошибку

builtins.IndexError: list index out of range 

Что я делаю не так. Подскажите пожалуйста.