-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing accents errors in excel #14
Comments
Hi Paula, This may not the best way to do it, but I think I manage to open it with:
Results here: Is this what you want? |
Many thanks Albert. |
Hmmm, I don't know VBA, but if the problem now is just the "broken rows", this can be fixed using other programming languages as I think the main reason for this is due to incorrect page break in column (2 and 6). Here is an example using R: library(readxl)
library(dplyr)
library(stringr)
file <- "https://github.com/ucl-ihi/CodeClub/files/3530647/certificados_cnrm_AM.xlsx"
tmp <- tempfile(fileext = ".xlsx")
httr::GET(url = file, httr::write_disk(tmp))
df <- read_excel(tmp, skip = 1)
df_neat <- df %>%
# remove column 3 & 13 (importing issue)
select(-3, -13) %>%
# concatenate value column 2 & 6 with the next row's value
# if the next row's value in column 1 (ID) is missing (NA)
mutate_at(vars(2,6),
~if_else(is.na(lead(df[[1]])),
paste(., lead(.)), .)) %>%
# remove duplicated whitespace and \n character
mutate_if(is.character, str_squish) %>%
# remove rows with missing column 1 (ID) value
filter(!is.na(.[[1]]))
# Fix column name
old_cols <- colnames(df) %>% str_squish
new_cols <- c(old_cols[1],
str_split(old_cols[3], " ")[[1]],
old_cols[11], old_cols[13])
colnames(df_neat) <- new_cols
writexl::write_xlsx(df_neat, "OUTFILE.xlsx") Results: You may need to reformat some of the columns (e.g. change Date column format to "Short Date) in excel to get the output that you want. |
It might be easier to work with the PDF - there are various ways to extract tables from a PDF - e.g. https://pdftables.com - or lots of advice if you Google! |
I need to download some text data in Portuguese which comes with accents (e.g. ã, í, é, ...). The website gives me the option to download the file either in excel or pdf. Although I want the data in excel, it reads the accents with error whereas pdf doesn't (the files are attached). Would anyone know how to fix such errors in excel?
Many thanks
Paula
certificados_cnrm_AM.xls.xlsx
certificados_cnrm_AM.pdf
The text was updated successfully, but these errors were encountered: