-
Notifications
You must be signed in to change notification settings - Fork 21
/
Copy pathCHANGES.txt
267 lines (182 loc) · 9.72 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
v0.16.4 (2024-10-18)
--------------------
- Fixed a recently introduced bug that made Pandas required for the library to work.
v0.16.3 (2024-10-18)
--------------------
- Added a new `skip_metadata` parameter to the `DwCAReader` constructor. When set to `True`, the metadata file is not read. This can be useful for example if the metadata is corrupt (invalid XML) and you don't need it.
v0.16.2 (2024-08-23)
--------------------
- Fix a packaging issue that prevented the release of v0.16.1 on PyPI.
v0.16.1 (2024-08-23)
--------------------
- Added official support for Python 3.12
- Proper error message when trying to use an unsupported combination of Pandas option and archives with default values (issue #106).
v0.16.0 (2023-11-13)
--------------------
- Dropped Python 3.5 and 3.6 support. Use Python 3.7 to 3.11 instead.
- Added a star record iterator to DwCAReader (thanks to @csbrown)
v0.15.1 (2023-01-17)
--------------------
- Workaround for errors with buggy GBIF downloads, see https://github.com/gbif/portal-feedback/issues/4533
v0.15.0 (2020-09-09)
--------------------
- Lower memory use for archives with extensions
- GBIFResultsReader (deprecated since v0.10.0) is now removed
- DwCAReader.get_row_by_id() (long deprecated) is now removed, use DwCAReader.get_corerow_by_id() instead
- DwCAReader.get_row_by_index() (long deprecated) is now removed, use DwCAReader.get_corerow_by_position() instead
v0.14.0 (2020-04-27)
--------------------
- Dropped support for Python 2.7
- API new: Temporary directory is now configurable
v0.13.2 (2019-09-20)
--------------------
- Better standard support: fields with data column can also have a default value (issue #80)
v0.13.1 (2018-08-30)
--------------------
- API new: DwCAReader.core_file_location
- API new: String representation (__str__) for CSVDataFile objects.
- API change: CSVDataFile.get_line_at_position() raises IndexError in case of line not found (previously: returned None)
v0.13.0 (2017-12-01)
--------------------
- Bugfix: better encoding support for Metadata file - see issue #73.
- API change: DwCAReader.get_descriptor_for(filename) now raises NotADataFile exception if filename doesn't exists (previously: None was returned).
- API new: DwCAReader.core_file (previously private: _corefile).
- API new: DwCAReader.extension_files (previously private: _extensionfiles).
v0.12.0 (2017-11-10)
--------------------
- API new: DwCAReader.pd_read() - See Pandas tutorial for usage.
- API new: new NotADataFile exception.
v0.11.2 (2017-10-18)
--------------------
- API new: DwCAReader.get_descriptor_for()
v0.11.1 (2017-10-11)
--------------------
- API new: DataFileDescriptor.short_headers
v0.11.0 (2017-10-10)
--------------------
- Bugfix: An enclosed field can now contain the field separator (By Ben Cail)
- API change: DwCAReader.get_row_by_id() is renamed to DwCAReader.get_corerow_by_id()
- API change: DwCAReader.get_row_by_index() is renamed DwCAReader.get_corerow_by_position()
- API new: DwCAReader.orphaned_extension_rows() (thanks to Pieter Provoost).
- API new: CSVDataFile.coreid_index (was previously known as CSVDataFile._coreid_index).
- API new: Row/CoreRow/ExtensionRow.position
v0.10.2 (2017-04-11)
--------------------
- experimental support for Windows.
v0.10.1 (2017-04-04)
--------------------
- fixed temporary directory: previously, it was always created under current dir instead of something chosen by Python such as /tmp.
v0.10.0 (2017-03-16)
--------------------
- GBIFResultsReader is now deprecated.
- API new: DwCAReader now provides source_metadata for GBIF-like archives (previously the main perk of GBIFResultsReader)
- API change: dwca.source_metadata is an empty dict (previously: None) when the archive doesn't has source metadata
- API new: dwca.utils._DataFile is now public and renamed to dwca.files.CSVDataFile.
v0.9.2 (2016-04-29)
-------------------
- Updated Darwin Core terms for the qualname helper (including Event Core).
- Updated dev. script (https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/45).
v0.9.1 (2016-04-28)
-------------------
- API new: DwCAReader.open_included_file(relative_path).
- InvalidArchive exception is now raised when a file descriptor references a non-existing field.
v0.9.0 (2016-04-05)
-------------------
- Support for new types of archives:
* DwCA without metafile (see page 2 of http://www.gbif.org/resource/80639), including or not a Metadata document (EML.xml). Fixes #47 and #7.
* DwCA where data fields are enclosed by some character (using fieldsEnclosedBy when the Archive provides a Metafile, autodetection otherwise). Fixes issue #53.
* Archives without a metadata attribute in the Metafile. See #51.
* Tgz archives.
- API change: SectionDescriptor => DataFileDescriptor
- API change: DataFileDescriptor.encoding => DataFileDescriptor.file_encoding
- API change: the reader previously only supported zip archives, and raised BadZipFile when the file format was not understood. It nows also supports .tgz, and throw InvalidArchive when the archive cannot be opened as .zip nor as .tgz.
- API new: DataFileDescriptor.fields_enclosed_by
- API new: DwCAReader.use_extensions
v0.8.1 (2016-03-10)
-------------------
- Support for archives contained in a single (sub)directory. See https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/49
v0.8.0 (2016-02-11)
-------------------
- Experimental support for Python 3.5 (while maintaining compatibility with 2.7)
v0.7.0 (2015-08-20)
-------------------
- Python-dwca-reader doesn't rely anymore on BeautifulSoup/lxml, using ElementTree from the standard library instead fot its XML parsing needs. This has a series of consequences:
* It should be easier to install and deploy (also on platforms such as Jython).
* All methods and attributes that used to return BeautifulSoup objects will now return xml.etree.ElementTree.Element instances. This includes DwCAReader.metadata. SectionDescriptor.raw_beautifulsoup and ArchiveDescriptor.raw_beautifulsoup have been replaced by SectionDescriptor.raw_element and ArchiveDescriptor.raw_element (Element objects).
v0.6.5 (2015-08-18)
-------------------
- New InvalidArchive exception. Currrently, it is only raised when a DwC-A references a metadata file that's not present in the archive.
v0.6.4 (2015-02-17)
-------------------
- Performance: an optional 'extension_to_ignores' parameter (List) can be passed to DwCAReader's constructor. In cases where an archive contains large but unneeded extensions, this can greatly improve memory consumption. A typical use-case for that would be the huge 'verbatim.txt' contained in GBIF downloads.
v0.6.3 (2015-02-16)
-------------------
- Performance: we now use core_id based indexes for extensions. There's a memory penalty, but extension file parsing is now only done once.
v0.6.2 (2015-01-26)
-------------------
- Better performance with extensions.
v0.6.1 (2015-01-09)
-------------------
- It can now open not zipped (directory) Darwin Core Archives
- More testing for Descriptor classes.
- Better respect of the standard (http://rs.tdwg.org/dwc/terms/guides/text/):
* We now support default value (\n) for linesTerminatedBy and fieldsTerminatedBy.
- Lower memory use with large archives.
v0.6.0 (2014-08-08)
-------------------
- Better performance thanks to a better architecture
- API add: brand new _ArchiveDescriptor and _SectionDescriptor
- API change: DwCAReader.descriptor is an instance of _ArchiveDescriptor (previously BeautifulSoup)
- API remove: DwCAReader.core_rowtype (use DwCAReader.descriptor.core.type instead)
- API remove: DwCAReader.extensions_rowtype (use DwCAReader.descriptor.extensions_type instead)
- API remove: DwCAReader.core_terms (use DwCAReader.descriptor.core.terms instead)
v0.5.1 (2014-08-05)
-------------------
- Performance: dramatically improved performance of get_row_by_index/looping for large files by
building an index of line positions at file opening (so there's a slight overhead
there)
v0.5.0 (2014-01-21)
-------------------
- API new: DwCAReader.descriptor
- API change: "for core_line in dwca.each_line():" => "for core_row in dwca:"
- API change: from_core and from_extension attributes of DwCALine (and sublasses) have been removed.
The isinstance built-in function can be used to test if a line is an instance of DwCACoreLine
or of DwCAExtensionLine.
- API change: DwCAReader.lines => DwCAReader.rows
- API change: DwCACoreLine => CoreRow
- API change: DwCAExtensionLine => ExtensionRow
- API change: DWCAReader.get_line_by_id => DWCAReader.get_row_by_id
- API change: DWCAReader.get_line_by_index => DWCAReader.get_row_by_index
- API change: DwCAReader.get_row_by_* methods throw RowNotFound when failure instead of returning None
- Cleaner code and better documentation.
v0.4.0 (2013-09-24)
-------------------
- API change: dwca.get_line() -> dwca.get_line_by_id()
- API new: dwca.get_line_by_index()
- (Core File) iteration order is now guaranteed for dwca.each_line()
- Refactoring: DwCALine subclassed (as DwCACoreLine and DwCAExtensionLine)
v0.3.3 (2013-09-05)
-------------------
- DwCALines are now hashable.
v0.3.2 (2013-08-28)
-------------------
- API: added the dwca.absolute_temporary_path() method.
v0.3.1 (2013-08-09)
-------------------
- Bugfix: lxml added as a requirement.
v0.3.0 (2013-08-08)
-------------------
- XML parsing (metadata, EML, ...) now uses BeautifulSoup 4 instead of v3.
v0.2.1 (2013-08-02)
-------------------
- Added a property (core_terms) to DwCAReader to get a list the DwC terms in use in the core file.
v0.2.0 (2013-07-31)
-------------------
- Specific support for GBIF Data portal (occurrences) export.
- Small bug fixes.
v0.1.1 (2013-05-28)
-------------------
- Fixes packaging issues.
v0.1.0 (2013-05-28)
-------------------
- Initial release.