Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix]修复 inlineStr 类型可能存在的数据读取错误 #3861

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

psxjoy
Copy link
Collaborator

@psxjoy psxjoy commented Jul 3, 2024

关联issue:

#3860
#3823

修复原因:

  1. 在 excel 的 XML 文件中,<c> 标签且标记为inlineStr类型的数据类别下,<v> 标签为无效且错误数据,WPS和office都会默认过滤该标签,同时进行纠错;
  2. 使用 poi 进行excel读取操作,poi 会主动过滤 <c> 标签且标记为inlineStr下的<v> 标签。

代码逻辑:

  1. CellDataTypeEnumstr匹配至ERROR
    str 标签是正常的String标签,对于xml文件来说,可以存放大部分的标签数据。原来的str匹配的是DIRECT_STRING,就会导致诸多规则不匹配的问题。
    修改后,因为取值时,ERRORDIRECT_STRING 的取值规则都是一致的,因此可以直接兼容。
 //CellTagHandler-> ERROR 和 DIRECT_STRING 的取值规则一致
  case DIRECT_STRING:
  case ERROR:
      tempCellData.setStringValue(tempDataString);
      tempCellData.setType(CellDataTypeEnum.STRING);
      break;
  1. 在SAX 中进行规则判断
    XlsxRowHandlerpublic void characters(char[] ch, int start, int length)函数中,对 inlinStr进行判断,如果此时数据结构为DIRECT_STRING,并且标签为<v>的时候,不进行解析,直接跳过。

@psxjoy psxjoy changed the title fix-修复 inlineStr 类型可能存在的数据读取错误 [fix]修复 inlineStr 类型可能存在的数据读取错误 Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

关于inlineStr 解析的问题 xlsx读取时和展示看到的一样,部分数据有随机前缀
1 participant