I am trying to find out the best way to filter my data frame. I have this following data sample:
name spouse 1 Adanel Belemir 19 Bodo Proudfoot 9 Angrim 17 Nurwë 25 Tar-Telemmaitë Unnamed wife 23 Tar-Vanimeldë Herucalmo 5 Fire-drake of Gondolin 22 Tarannon Falastur Berúthiel 2 Boromir 10 Anárion Unnamed wife 15 Angelimar 11 Ar-Pharazôn Tar-MÃriel 12 Ar-Sakalthôr Unnamed wife 24 Tar-Telperiën None 6 Ar-Adûnakhôr Unnamed wife 16 Angbor 3 Lagduf 4 Tarcil Unnamed wife 8 Angrod Eldalótë 18 Linda (Baggins) Proudfoot 7 Annael 21 Pengolodh 13 Ar-Gimilzôr Inzilbêth 20 Penlod 14 Angelimir Unnamed wife
To treat the data and filter the empty spouse rows and rows that have invalid data like “Unnamed wife”, “none”, “None known” and others I am using this following code:
interacialMariage <-data %>% filter(spouse != "",spouse != "Unnamed wife", spouse != grepl("none",ignore.case = TRUE, spouse)) %>% select(name, spouse)
I would like to know if there is any “contains” function that can help me to filter all words that start with “None” for example.
I’ve tried the grepl()
function, but it didn’t work. If I create many (spouse != case ) for each case it will work, but I don’t think that it is the best way to do it.
Is there a way to optimize the way that I filter this data?
Thanks!