forked from armijnhemel/binaryanalysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
3846 lines (3171 loc) · 140 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
CHANGELOG HAS MOVED TO GITHUB
2016-09-12:
* move configs directory to bat-data directory, remove from the default BAT
distribution
* busybox.py: change the path to look for the BusyBox pickles
* tagging BAT 27
2016-09-11:
* guireport.py/generatereports.py: don't keep results in memory unnecessarily
but write them to a file earlier
* prerun.py: tag compiled terminfo files
* bat-scan.config: add configuration for compiled terminfo files
* prerun.py: expand timezone file checks
2016-09-09:
* prerun.py: extra sanity checks to prevent unnecessary calls to xmllint
* guireport.py: use more efficient string concat for duplicates. In case there
are many duplicate entries (one test archive seen had 48000+ duplicate files)
this can save a lot of time
2016-09-08:
* bruteforcescan.py: avoid building up temporary lists in postrun scans
* bruteforcescan.py: avoid building up temporary lists when processing
unpackscan results
* generatejson.py: avoid building up temporary lists
* prerun.py: extra sanity checks to prevent unnecessary calls to xmllint
2016-09-07:
* fwunpack.py: extra sanity check for broken GIF files
* fwunpack.py: clean up in case tar barfs, but some data was unpacked
* fwunpack.py: skip double entries in tar files
* fwunpack.py: extra sanity check for base64 files
* fwunpack.py: remove symbolic links to directories in squashfs unpacking in
case it fails but some data was unpacked
* fwunpack.py: fix for broken symbolic links in romfs files
2016-08-30:
* fwunpack.py: add many more sanity checks for GIF. Extract XMP data for GIF
(not stored yet)
2016-08-29:
* busybox.py: make the BusyBox version number extraction a little bit more
robust
2016-08-24:
* storeresult.py: add origin field for passwords
* bruteforcescan.py: compare files to earlier runs of BAT and record TLSH
distance
* bruteforcescan.py: fix issue where in some cases files were not tagged
properly
* bruteforcescan.py: record exact matches to earlier BAT scans
2016-08-23:
* add storeresult.py to store results of a previous BAT run in the database
2016-08-22:
* bruteforcescan.py: add TLSH hashing for binary files, record more hashes in
unpackreports (if available)
* licenseversion.py: don't build up a list of variables and fields for every
field/variable that is looked at. Instead, use a set() that is created once
per method invocation.
2016-08-21:
* bruteforcecan.py: record some more statistics about the total run time of a
scan
* bruteforcescan.py: record for each blacklist entry by which scan it was set
* fwunpack.py: rework cramfs sanity checks, add more sanity checks for cramfs
* fwunpack.py: tag ext2/3/4 files
* createdb.py/postgresql-*.sql: add "blacklist" database for known files that
should be ignored (example: broken ext2 file systems that are present as test
data in e2fsprogs)
2016-08-19:
* createdb.py: check entries in the manifest before unpacking an archive. This
could save quite a bit of I/O in the case of big archives (Linux kernel,
LibreOffice)
* bruteforcescan.py: record some runtime statistics about each phase,
individual aggregate scans and the total amount spent on scanning (except for
writing the dump file)
* remove jdserialize from bat-extratools-java
* identifier.py: also record references to source code file names in the Linux
kernel ending in .S
2016-08-14:
* fwunpack.py: add zisofs support to ISO9660 unpacking
* docs: better document JSON
2016-08-12:
* bat-unyaffs: add another chunk/spare combination observed in the wild
* bruteforcescan.py: make sure the top level element is tagged as such
* fwunpack.py: correct offset for ext2 feature flags for sparse_super check
2016-08-11:
* fwunpack.py: ext2 superblock checks need different offsets if blocksize ==
1024
* fwunpack.py: TTF sanity checks to make sure that fonts that are not 4 byte
aligned don't trigger a crash, plus that some checks don't read beyond where
they should read in case multiple files are concatenated and some fonts are
not 4 byte aligned.
* bruteforcescan.py: reenable scanfirst, but not for lzma
* bruteforcescan.py: import leaf scans only once per thread instead of once
per file
2016-08-10:
* createdb.py: fixes for website entries in case there is no DOWNLOADURL file
2016-08-09:
* postgresql-table.sql: add column to archivealias table
* createdb.py: use website column (if present) in 'processed' table and also
record archive aliases
2016-08-08:
* fwunpack.py: use deque and popleft() for list of trailers in JPEG unpacking
instead of a list
* fwunpack.py: don't reopen files in compress checking
* fwunpack.py: set maximum size for JPEG data (hardcoded to 100 MiB right now)
* fwunpack.py: several sanity checks for JPEG checking
* prerun.py: hardcode a few filters to avoid passing around large dicts of
offsets
* createdb.py: start processing website URLs for packages
2016-08-07:
* fwunpack.py: add some LZMA sanity checks
* fwunpack.py: check for JPEG APP2 ICC profile, many extra JPEG sanity checks
* fwunpack.py: don't keep opening files in compress unpacking
* fwunpack.py: start reworking JPEG unpacking, add many more sanity checks
2016-08-06:
* bruteforcescan.py: pass database connection and cursor to postrun scans,
rewrite to use a queue instead of a pool of processes
* guireport.py/images.py/generatehexdump.py: use new interface for postrun
scans
* fwunpack.py: minimally check validity of ext2 superblock copies
2016-08-05:
* identifier.py: extract identifiers only from the .data section in a bFLT
file
* bat-scan/bruteforcescan.py: store some statistics about the underlying
system in the result archive
2016-08-04:
* fwunpack.py: add support for Android sparse data image files (currently only
system.new.dat and only if the right information is there)
* bat-scan.config: add configuration for unpacking Android sparse data image
files
* fsmagic.py: add magic for ELF and BFLT
* prerun.py: add verifier for BFLT
* bat-scan.config: add configuration for tagging BFLT files
2016-08-03:
* identifier.py: work more on Dalvik unpacking (does not work yet)
* bruteforcescan.py: remove leafScan, run all the leafScans directly after
unpacking is done. For large firmwares this saves building a list of tasks in
memory, and possibly resources are better utilized when the unpacker is
running for a long time on a single task (LZMA unpacking for example), but
there are files for which leafscans can already be run.
* fwunpack.py: extra sanity check for lzop files
2016-08-02:
* identifier.py: work more on Dalvik unpacking (does not work yet)
2016-08-01:
* elfcheck.py: extra size check in case there are no section headers
* findlibs.py: replace call to readelf
* identifier.py: replace call to readelf
2016-07-31:
* elfcheck.py: add convenience method to extract information about a single
section from the ELF file
* identifier.py: replace call to readelf with own ELF parser (kernel symbols)
* kernelanalysis.py: replace call to readelf with own ELF parser (module
information). Support 2.4 kernel now too.
* findlibs.py: replace several calls to readelf
* findlibs.py: fixes for latest pydot (probably needs installation via pip,
out of scope, so keep disabled for now)
* prerun.py: remove call to readelf
* busybox.py: remove call to readelf
2016-07-30:
* elfcheck.py: extract symbols from strtab section from non-stripped binaries
* identifier.py: replace call to readelf with own ELF parser (symbols)
2016-07-29:
* javacheck.py: add more sanity checks
* fwunpack.py: some more sanity checks for ar
* elfcheck.py: correct offset
2016-07-28:
* elfcheck.py: return a list of SONAME values
* fixduplicates.py: replace call to readelf
* elfcheck.py: extract dynamic symbols (needs more work)
2016-07-27:
* elfcheck.py: split verification code into parser and verifier
* elfcheck.py: extract more information (soname, rpath, needed libs)
* checks.py: replace call to readelf with own ELF parser (dynamic libs)
2016-07-26:
* elfcheck.py: start working on own ELF parser
* checks.py: replace call to readelf with own ELF parser (architecture)
* createdb.py/postgresql-table.sql/postgresql-index.sql: add extra column in
'processed'
* fwunpack.py: extra sanity checks for Android sparse files
2016-07-13:
* fwunpack.py: remove external call to 'ar' in ar unpacking
* fwunpack.py: proper cleanup for xar
* prerun.py: remove verifyJavaClass
* fwunpack.py: add Java class carver
* bat-scan.config: add configuration for Java class carver
* javacheck.py: pass offset to Java class checker, drop requirement that whole
file has to be a class file
* prerun.py: keep mapping of offset to keys
2016-07-12:
* fsmagic.py: add magic for ICS
* fwunpack.py: add unpacker for ICS
* bat-scan.config: add configuration for ICS
* prerun.py: remove unused code in verifyJavaClass
2016-07-11:
* fwunpack.py: add unpacker for xar files
* bat-scan.config: add configuration for xar unpacker
2016-07-10:
* fsmagic.py: add identifier for xar
2016-07-09:
* fwunpack.py: add known method for ar (includes deb, udeb, etc.)
* fwunpack.py/fsmagic.py/bat-scan.config: lzo -> lzop
* fsmagic.py: add identifiers for ID3 (1 and 2)
* bruteforcescan.py/bat-scan: throw error if there are duplicate sections in
the configuration
2016-07-08:
* fwunpack.py: rewite ISO9660 unpacking (ISO9660, Rock Ridge) and remove fuse
and fuseiso dependencies. Still needs support for Joliet and zisofs to be on
par with fuseiso though.
2016-07-02:
* fwunpack.py: add more ISO9660 checks
* fwunpack.py: remove temporary files if ar unpacking failed but unpacked some
data
2016-07-01:
* bruteforcescan.py: add modular configuration
2016-06-30:
* findlibs.py: extra sanity check for architecture
* bruteforcescan.py: fix config parsing
* bruteforcescan.py: add configdirectory option to implement modular
configuration scanning
2016-06-24:
* bruteforcescan.py: fix database usage for leafscans
* fwunpack.py: start on implementing length checks for ISO9660
2016-06-23:
* fwunpack.py: add version byte sanity check for ISO9660
* bat-scan.config: remove obsolete config parameters
* tagging BAT 26
2016-06-22:
* prerun.py/fwunpack.py: remove Ogg tagger and replace with Ogg unpacker
* bat-scan.config: add configuration for Ogg unpacker, remove Ogg tagger
configuration
* bruteforcescan.py: fix leafscans debugging
* bat-unyaffs: more sanity checks to prevent false positives when scanning DLL
files
* fwunpack.py: clean up after squashfs unpackers if they cannot unpack and
some data was left behind (for example: symbolic links)
* fwunpack.py: crude hack to ignore some false positives for base64 decoding
2016-06-21:
* prerun.py/fwunpack.py: remove OTF tagger and replace with OTF unpacker
* bat-scan.config: add configuration for OTF unpacker, remove OTF tagger
configuration
* renamefiles.py: fix index, initramfs in kernels will not always be the first
item in the 'scans' list
* prerun.py: replace call to mkeot with own TTF verifier
* setup.cfg: remove dependency on eot-utils
* prerun.py/fwunpack.py: remove TTF tagger and replace with TTF unpacker
* bat-scan.config: add configuration for TTF unpacker, remove TTF tagger
configuration
* fwunpack.py: better check BMP files
* bruteforcescan.py: allow scans to indicate that certain other scans can
ignore the blacklist
* fwunpack.py: PNG can ignore the blacklist of TTF and OTF fonts (to unpack
glyphs)
2016-06-20:
* prerun.py/fwunpack.py: remove WOFF tagger and replace with WOFF unpacker
* bat-scan.config: add configuration for WOFF unpacker, remove WOFF tagger
configuration
* prerun.py: rework OTF verifier and no longer use external tools
* setup.cfg: remove fonttools as dependency
2016-06-19:
* fwunpack.py: replace call to webpng with own PNG checking code
* setup.cfg: remove dependency on gd-progs
* fwunpack.py: replace call to icotool with own unpacking code for ico,
since icotool would always spit out PNG files, not BMP files. BMP files still
need to be corrected though.
* prerun.py: more sanity checks for ICO files.
2016-06-18:
* prerun.py: remove verifyGraphics and verifyBMP
* fwunpack.py: add searchUnpackBMP that also allows unpacking and carving BMP
files
* bat-scan.config: add configuration for new BMP code
* fwunpack.py: tag romfs file systems, correct blacklist and size for romfs
2016-06-16:
* cveparser.py/createdb.config: open source first version of CVE parser
* fwunpack.py: tag ar files
* fwunpack.py: tag cramfs files, rework checks
* fwunpack.py: tag cpio files, correct size reporting of cpio
2016-06-15:
* fwunpack.py: add known file method for PNG files (extension: .png). This is
mostly to prevent bat-unyaffs running on many many PNG files.
* bat-scan.config: add known file method configuration for PNG files
* bruteforcescan.py: prevent files that are already known to be picked up by a
known file method if they happen to have the matching extension (example:
.png)
* bruteforcescan.py: refactor code for readability
* fwunpack.py: rework Java deserialization code. Remove jdeserialize as a
dependency and instead just focus on carving/verifying known serialized Java
files. Tag the serialized Java files as "javaserialized", instead of the data
grabbed from these deserialized files.
2016-06-14:
* bat-unyaffs: extra sanity check to prevent crash and filling up the abrt log
on Fedora systems
* bat-minix: extra sanity check to prevent crash and filling up the abrt log
on Fedora sytems
* fsmagic.py: the PNG trailer has always the same length and CRC, so use it in
the signature
* fwunpack.py: extra sanity checks for PNG
* fwunpack.py: use deque for slicing PNG trailers that are left instead of
using lists. There are firmwares where (because of false positives and/or some
missing information, like on some Android firmwares) there can be a few
tenthousand PNG files, so the list of trailers can get really long. Unpacking
these files can take quite a long time.
2016-06-13:
* fwunpack.py: introduce setup scan for tar/jffs2/compress temporary unpacking dir
* bat-scan.config: fix defaults for lzma/jffs2/compress
* bruteforcescan.py: introduce global temporary unpacking directory
* identifier.py: replace DEX_TMPDIR with UNPACK_TEMPDIR
* fwunpack.py: replace setup methods for lzma, compress, tar and jffs2 with
environment lookup to UNPACK_TEMPDIR
* bat-scan.config: rename tempdir to unpackdirectory, add
temporary_unpackdirectory (to define UNPACK_TEMPDIR)
2016-06-12:
* bruteforcescan.py: introduce setup scans for unpack scans
* fwunpack.py: introduce setup scan for lzma temporary unpacking dir
* bat-scan.config: fix defaults for lzma
2016-06-11:
* bruteforcescan.py: make timeout for tasks configurable, default 1 month
* bruteforcescan.py: make using the database optional, default: yes
* bruteforcescan.py: implement scrubbing of the configuration file (example:
database credentials)
* bruteforcescan.py: fix dumping of marker search offset data
* bruteforcescan.py: support 'compress' in top level configuration and use it
in more places
* bat-scan.config: add more defaults
* docs: better document options in configuration file
2016-06-10:
* bruteforcescan.py: remove unnecessary code, better document flow
2016-06-08:
* bat-scan: add sanity check, don't scan if there are no tasks (saves running
set up code)
* bruteforcescan.py: remove some unused code, better document code
2016-06-07:
* remove SQLite support from scanning engine, use PostgreSQL only
* bruteforcescan.py: simplify data structures for unpack scan processing
* bat-scan.config: simplify configuration file, add some saner defaults
2016-05-02:
* identifier.py: fix return type for Linux kernel symbols
* bruteforcescan.py: start on carving out data that could not be unpacked, but
for which it is still useful for it to be carved out. Example: a Linux kernel
image, or several Linux kernel images, or a bootloader, from a larger
firmware.
2016-04-28:
* bat-sqlitetopostgresql.py: change hashconversion because now tlsh hashes are
stored too
2016-04-24:
* createmanifests.py: more aggressively cache hashes
2016-04-11:
* identifier.py: try to identify source code file names in Linux kernel binary
images
* identifier.py: try to extract Linux kernel symbol names regardless whether
or not certain database tables exist (leftover from before refactoring)
* identifier.py: better deal with siutation of loops_per_jiffy appears
multiple times in a Linux kernel binary image and it is the first in the list
of kernel symbols (not preceded by a NULL character)
2016-04-08:
* prerun.py: extra sanity check for ELF files (ELF type)
2016-03-29:
* bat-scan.config: add configuration for unpacking Parrot PLF files.
2016-03-28:
* fwunpack.py: extract more information from Parrot PLF files. This still
needs some more work, as PLF files have a slightly non-standard structure.
2016-03-27:
* fwunpack.py: start working on unpacking Parrot PLF files
2016-03-26:
* prerun.py: extra sanity checks for ihex unpacking
* fwunpack.py: add extra flavour of squashfs
* add reportcopyright.py to do a simple search of copyright statements in
binaries
* bat-scan.config: add configuration for copyright statement checker
2016-03-19:
* fwunpack.py: fix extra sanity check in ext2 unpacking
2016-03-07:
* generatejson.py: don't add a connection to the list of connections twice
2016-03-05:
* prerun.py: add verifier for Intel HEX files
* fwunpack.py: unpack Intel HEX files
* bat-scan.config: add configuration to unpack Intel HEX files
2016-02-27:
* generatereports.py: make gzip compression optional
* guireport.py: make gzip compression optional
* bruteforcescan.py: make gzip compression of pickles optional (not enabled
yet)
* rename generatelistrpm.py to extractrpms.py
* generatejson.py: remove duplicate code
2016-02-26:
* createdb.py: work around error in zipfile module. The file
xf86-input-keyboard-1.3.1.tar.bz2 from OpenWrt for example was misidentified
as a ZIP file.
* generatelistrpm.py: fixes for crc32
2016-02-23:
* brutefrorcescan.py: add 'compress' configuration option (default: no) for
later use in reporting
* generatejson.py: make JSON gzip compression optional
* bat-scan.config: compress JSON files by default
* bat-scan: fix using -u with relative paths
2016-02-22:
* createdb.py: work around Ninka barf
2016-02-21:
* createdb.py: fix invocation for Ninka
2016-02-20:
* createdb.py: interface to Ninka has changed, bump to 2.0-pre1
2016-02-14:
* fsmagic.py: add marker for Android backup files
* fwunpack.py: unpack Android backup files
* bat-scan.config: enable Android backup file unpacking
2016-01-30:
* batextensions.py: also look at .dts and .dtsi files (Linux kernel specific)
2016-01-29:
* updatesha256sum.py/generatelistrpm.py/createdb.py/createmanifests.py: newer
TLSH implementations can handle files 256 bytes in size
2016-01-27:
* generatejson.py: extra sanity check
* tag BAT 24
2016-01-26:
* fwunpack.py: extra sanity check for "compress"
2016-01-25:
* fsmagic.py: add marker for MS WIM files
* fwunpack.py: dd copies permissions of original file, so chmod after dd
* bruteforcescan.py: chmod after copying the binary to scan
* fwunpack.py: use unpackFile() in ISO9660 unpacking
* fwunpack.py: unpack MS WIM files with 7z
* bat-scan.config: enable MS WIM unpacking
* fwunpack.py: unshield can leave partial results, so clean up
2016-01-24:
* fwunpack.py: fix sanity check in jpeg unpacking, extract (some) XMP data
from JPEG files
2016-01-23:
* generatejson.py: avoid creating database connections unnecessarily
* bruteforcescan.py: fix debugging logic after rearranging scans
* bruteforcescan.py: reintroduce setup methods for aggregate scans
* licenseversion.py: use setup method
* kernelsymbols.py: use setup method
* bat-scan.config: configure setup methods
* bruteforcescan.py: cleanups, don't create pools when not needed
* licenseversion.py: inline compute_version()
* licenseversion.py: more aggressively cache results
* generatejson.py: first dump JSON results, then compress with gzip
* licenseversion.py: don't create database connections if not needed. This
helps in 'directory scanning mode' to keep the amount of database connections
and sockets down.
2016-01-22:
* bat-scan: extra sanity check for broken symbolic links, plus properly create
subdirectories in 'directory scanning mode'.
* bat-scan/bruteforcescan.py: only do setup scans once when scanning in
directory mode
2016-01-20:
* bat-scan: extra sanity check to make sure that file dir and output dir are
not the same
* prerun.py: add simple verifier for WAV files
* fwunpack.py: extra sanity check for compress: first unpack some data and try
to uncompress it. Because it is a stream this will succeed, even if the file
is not complete. False positives can be filtered out easily this way.
2016-01-19:
* fwunpack.py: tag Java deserialized data
* licenseversion.py: explicitely call commit()
* generatejson.py: explicitely call commit()
* identifier.py: explicitely call commit()
* security.py: add code to check for certificates
* bat-scan.config: add configuration for tagging certificates
* identifier.py: treat Java serialized files as Java
* licenseversion.py: explicitely do variable names processing just for C files
now
* licenseversion.py: don't unnecessarily read pickle files
* prerun.py: remove bogus check for appledouble files
* identifier.py: tag oat files as Java and ignore for now.
2016-01-18:
* prerun.py: finish code to tag WebP files
* prerun.py: add another known extension for ICO files
* prerun.py: also tag XML files that start with a UTF-8 byte order marker
2016-01-17:
* prerun.py: tag Chrome .pak files
* prerun.py: tag more RSA certificate files
* fsmagic.py: add magic for RIFF
* prerun.py: start on verifying and tagging WebP files
* bat-scan.config: add configuration for WebP verification prerun scan
2016-01-16:
* licenseversion.py: remove unused parameter from prune()
* licenseversion.py: create cursors per language and process per language.
This can save quite a few open sockets when using PostgreSQL.
* file2package.py: fix multiprocessing
2016-01-15:
* prerun.py: better checks for Android Dex/Odex
* prerun.py: tag Android oat files as "oat"
* identifier.py: start parsing Dex files
* security.py: tag openssh private keys
* batdb.py: set autocommit for PostgreSQL
* bruteforcescan.py/generatjson.py/licenseversion.py: set a time out for the
queues to avoid PostgreSQL "idle in transaction" errors
2016-01-13:
* bat-scan.config: ignore sqlite3 files in the ranking scan
* generatejson.py: use the length of jsontasks to determine the minimum of
processes needed instead of the jsontasks list
* generatejson.py: don't join the queue with just one thread, as that
effectively makes generating the json files single threaded
* fwunpack.py: merge unpackIco into searchUnpackIco
* fwunpack.py: fix multiframe JPEG unpacking
* fsmagic.py: add magic for Android oat packages
2016-01-12:
* prerun.py: add simple verifier for RSA certificates which can often be found
in Android APK files
2016-01-11:
* prerun.py: start working on better sanity checks for MS Windows icons and
cursor icon files
2016-01-10:
* bruteforcescan.py: move some code and comments related to unpack scans from
leaf scans to unpack scans
* prerun.py: thorough sanity checks for Android's Dex format
* bat-scan.config: set minimumsize for iso9660 files
2016-01-09:
* fwunpack.py: only read two bytes for the ext4 sanity check in android sparse
image files instead of the entire file. D'oh!
* ext2.py: simplify ext2 unpacking
* licenseversion.py: only call JAR aggregator if there are Java results
2016-01-08:
* fwunpack.py: extra sanity checks for Squashfs unpacking
* fwunpack.py: simplify squashfs unpacking for weird squashfs
* fwunpack.py: first check integrity of Android sparse file before unpacking
* fsmagic.py: add magic for AppleDouble
* prerun.py: tag AppleDouble encoded files (resource forks)
* bat-scan.config: ignore resource forks in many scans
* fwunpack.py: update Android sparse file unpacker for images that have an
extra header in front of the ext4 file system. Needs many more sanity checks.
2016-01-07:
* bat-scan.config: don't test text files for Ogg
* fwunpack.py: ext2 image cannot extend beyond the filesize. For this check
also take the offset into account.
* bruteforcescan.py: introduce minimumsize for unpackscans, so unpackscans can
skip files that are too small
* bat-scan.config: set minimumsize for yaffs2 as an example
* fwunpack.py: don't unnecessarily loop over JPEG trailers
2016-01-06:
* bat-unyaffs (bat-extratools): support inband tags
* fwunpack.py: rewrite YAFFS2 unpacking to use new bat-unyaffs and unpack more
YAFFS2 file systems
* fwunpack.py: fix JPEG size reporting
2016-01-04:
* unpackrpm.py: read a bit less data in RPM unpacking, add more sanity checks,
add support for more payloads (depends on support in rpm2cpio though)
* fwunpack.py: fix incorrect check in ZIP unpacking so a lot of data was still
written to disk instead of directly into memory.
* fwunpack.py: rewrite ZIP unpacking again, to correctly work around ZIP files
just being stored (so not deflated again) inside other ZIP files, inside a
file system that cannot be easily unpacked (like some flavours of YAFFS2)
2016-01-03:
* fwunpack.py: tagging fixes for XZ files
* fwunpack.py: introduce ZIP_MEMORY_CUTOFF environment variable
* fwunpack.py: fix diroffsets for PNG
* checks.py: if there is blacklisted data skip checks that won't succeed
because there is too little data
* busyboxversion.py: if there is blacklisted data skip tests that won't
succeed because there is too little data
* busybox.py: don't read a file to search for the busybox version number if no
busybox marker is present
* busybox.py: try to match version numbers earlier, read a lot less data
* fwunpack.py: fix encrypted ZIP reporting
* fwunpack.py: add searchUnpackKnownZip() to bypass checks for files with
known ZIP extensions. If unpacking the file this way is unsuccessful the file
will be unpacked using the regular scanning process. Most files with known ZIP
extensions will very likely be ZIP files.
* bruteforcescan.py: remove tagKnownExtension() as it is no longer needed
2016-01-02:
* fwunpack.py: vastly simplify ZIP unpacking
2015-12-31:
* fwunpack.py: carve ZIP archives from larger files in a different way, add
many more sanity checks for ZIP unpacking
2015-12-30:
* busyboxversion.py: like checks.py don't first write all non-blacklisted data
to a file to search for the version string, but write each chunk of
non-blacklisted data to a temporary file and then search that temporary file
2015-12-28:
* fwunpack.py: rework squashfs unpacking: filter out false positives earlier
and report file size for more flavours
* fsmagic.py: make Microsoft Cabinet archive header checking more robust
* fwunpack.py: extra sanity check for RAR files (RAR 4 and earlier only for
now)
2015-12-27:
* fsmagic.py: make RAR header checking a bit more robust
* fwunpack.py: refactor some squashfs checks
* fwunpack.py: do yaffs2 sanity check earlier
* fwunpack.py: sanity checks for other variants of squashfs (dd-wrt, openwrt)
* fwunpack.py: work around libmagic barf in exe unpacking
2015-12-26:
* fwunpack.py: version check fixes for squashfs, grab size from some squashfs
file systems directly from the file
* fwunpack.py: check for validity of ZIP files would fail if ZIP file did not
start at offset 0
2015-12-24:
* checks.py: when data in a file is blacklisted don't first write all the
non-blacklisted data to a file and then search for identifiers, but search the
data rightaway.
2015-12-22:
* fwunpack.py: reset offset in cpio file for each trailer
2015-12-20:
* fwunpack.py: extra sanity checks for lzip, no longer depend on processing
(English) output of lzip for file size verification
* fwunpack.py: make names of anonymous PNG, JPEG and GIF files more predictable
2015-12-19:
* fwunpack.py: extra LZMA check
* fwunpack.py: replace dependency on output of tune2fs by checking ext2 file
directly
* bruteforcescan.py: older versions of python-magic don't have CDF flags
* bruteforcescan.py: don't build intermediate data structures unnecessarily
2015-10-31:
* busybox.py: fix finding version number
* fwunpack.py: fix GIF unpacking
* guireport.py: correctly report size of files even if no magic is set because
libmagic barfed
2015-10-18:
* createdb.py: fix invocation of filterfiles()
* createdb.py: don't try to delete the Ninka comments file twice
2015-10-13:
* fwunpack.py: unsquashfs can change permissions of directory it unpacks data
into so permissions need to be changed back.
* fwunpack.py: detect segfaults of certain squashfs unpackers and work around
them
* bruteforcescan.py: disable CDF scanning when determining 'magic'
2015-10-12:
* prerun.py: extra sanity check for verifyELF
2015-10-08:
* fwunpack.py: extra sanity check for broken gzip data
* bruteforcescan.py: deal with broken symbolic links when python-magic does not
like it because of encoding issues
2015-10-03:
* remove pretty printing option, as it was unmaintained
* fwunpack.py: no longer rely on 'magic' to unpack cab files, better tag cab
files where the whole file is a cab archive
* fwunpack.py/extractor.py: rework extraction of Windows executable assembly
information
* fwunpack.py: extra sanity check for LRZIP
* prerun.py: sanity check for processing readelf output
* fwunpack.py: extra sanity check for XMP data in GIF
2015-09-29:
* licenseversion.py: explicitely call commit() on database connections to
avoid lots of warning messages in the PostgreSQL log
* release BAT 23
* fwunpack.py: some more sanity checks for YAFFS2 unpacking (files need to be
at least 512+16 bytes)
2015-09-28:
* generatejson.py: avoid running out of memory when dumping data to a json
file.
2015-09-27:
* fwunpack.py: allow carving of SWF files from the middle of a file.
* fwunpack.py: don't use struct.unpack() for PNG IHDR check as it will always
be the same anyway
* fwunpack.py: add first version of JPEG unpacker
* prerun.py: remove verifyJPEG() as it is no longer needed
* prerun.py: remove verifyJAR(), as it was not used and not working properly
anyway.
2015-09-26:
* kernelanalysis.py/other files: remove calls to "modinfo" and replace by
"readelf" instead
* prerun.py: fix tagging of some ELF files
* checks.py/guireport.py: fix reporting of names of applications found in the
'marker' search
* fwunpack.py: check if PNG has a valid IHDR chunk
* remove module-init-tools as dependency
* fwunpack.py: rework lookup of big endian candidates of JFFS2 file systems.
Using a set instead of a list can save a lot of time. Also implement for
cramfs (although there are not as many false positives as with JFFS2).
2015-09-25:
* checks.py: merge all marker searches for specific programs into one method
to avoid reading a file X times
* bruteforcescan.py: allow dumping of offsets into a Python pickle if
configured
* bruteforcescan.py: make minimum threshold of parallel generic marker search
scan for the top level file configurable
* bat-scan: add sanity check for configuration file
* fwunpack.py: fixes for end of line marker in the PDF xref section
* bruteforcescan.py: import (most) prerun scans and unpack scans once per
thread instead of once per file
2015-09-24:
* bat-scan.py: add --version option
* prerun.py: correctly tag Linux kernel modules that have a signature
* prerun.py: remove verifyGzip as it is no longer needed since gzip unpacking
has become a lot more efficient. Also removes another instance of processing
English language output of an external command.
2015-09-19:
* prerun.py: allow marker search to work on a chunk of a file
* bruteforcescan.py: allow the marker search to be run in parallel for the top
level file, if it is larger than a certain size (hardcoded for now) and it
does not have a known extension
* bruteforcescan.py: don't build a list of scan tasks first, but put tasks
into the scanning queue immediately, saving memory and making results
available a very very tiny bit earlier
2015-09-18:
* fwunpack.py: rework LRZIP unpacking, with more sanity checks (size + MD5),
allow uncarving from a bigger blob and remove a dependency on processing
(English) output of lrunzip which prevents BAT from running on systems with
other locales
2015-09-17:
* fwunpack.py: more sanity checks for GIF files
* fwunpack.py: write out unpacked LZMA data earlier, tag and blacklist LZMA
data whenever possible
* fwunpack.py: don't forget to remove temporary gzip directories after
unsucessful unpacking
* fwunpack.py: add a (crude) way to avoid using "dd" in unpackFile() when
carving data up to a certain length. This is useful in case the data has to be
carved from really big files
* prerun.py: do not tag files as "elf" if the size does not correspond with
the file size. Test case: a firmware that starts with a valid ELF file. More
research is needed.
2015-09-16:
* bruteforcescan.py: allow tagging for files that have not been completely
scanned yet and pass tags as hints to unpacked files
* prerun.py: remove verifyGIF() as it is no longer needed, ignore files tagged
* fwunpack.py: filter out LZMA false positives by decompressing a small amount
of data. Also add some more sanity checks to filter out more false positives.
* fwunpack.py: don't follow symlinks in iso9660 files when copying the data.
* fwunpack.py: extra ar header sanity check
2015-09-15:
* fwunpack.py: don't read all data from a file in PNG unpacking (leftover from
old code that should have been deleted), also let webpng read from stdin to
avoid hitting disk unnecessarily.
* disable verifyMP4 for now as mp4dump is no longer packaged in Fedora's
libmp4v2 package it seems
* fwunpack.py: don't read the whole file for CPIO unpacking, but only the
CPIO data
* fwunpack.py: fix gzip file renaming if a filename was recorded in the gzip
file
* fwunpack.py: if length parameter given to unpackFile() is the same as the
file size set the length to 0 instead, so the file can be copied or
hardlinked.
* fwunpack.py: if length given to unpackFile() is larger than dd's maximum
limit: copy first, then truncate
* fwunpack.py: tag jffs2 files if the whole file is a file system
* fwunpack.py: correct offset reporting for PNG files
* bruteforcescan.py: allow scans to pass contextual information about unpacked
files to its children. This is useful for example to tell that files that were
unpacked, such as PNG or GIF files are already complete files and no longer
need to be scanned at all, saving memory, disk I/O and CPU time by skipping
the marker search and any other scans as well
* prerun.py: remove verifyPNG() as it is no longer needed, ignore files tagged
as 'graphics' in verifyGraphics()
* prerun.py/bat-scan.config: don't run verifyText() for files that have
already been tagged as binary or text
2015-09-14:
* bruteforcescan.py: remove superfluous parameter to leafScan(), fix scanname
in scan()
* fwunpack.py: replace unpackGzip() by raw deflate unpacking. It can save a
lot of disk I/O when carving gzip compressed data from large files
* fwunpack.py: do not try to unpack a romfs that has 0 bytes
* fwunpack.py: rename unpacked files that are gzip compressed and end with
.tgz or .gz in the same way as gunzip does, unless the file had a name
recorded in the gzip file
* fwunpack.py: extra sanity check for LZO compressed data. Needs more work.
* fwunpack.py: make PNG and GIF unpacking a lot more efficient
2015-09-13:
* prerun.py: allow setting offset for genericMarkerSearch
* fwunpack.py: remove useless check (output from pdfinfo), add extra sanity
checks for PDF unpacking
* prerun.py: remove verifyBZ2, as it was broken and unused
* fwunpack.py: big file fixes for XZ unpacking
* fwunpack.py: mimic behaviour of unxz in XZ unpacking if filename ends in .xz
and entire file is a XZ compressed file
* bruteforcescan.py: pass hints from a scan about its children.
* bruteforcescan.py: only compute which files to ignore by prerun scans once
instead of per thread
* bruteforcescan.py: fix putting tasks in scanning queue
* bruteforcescan.py: add offsets to scan tasks. This might come in handy for
for example precomputing offsets
* fwunpack.py: mimic behaviour of bunzip2 if filename ends in .bz2 or .tbz2
and entire file is a bzip2 file
2015-09-12:
* fwunpack.py: use the BZ2Decompressor object from Python's bz2 module and tag
bzip2 files as 'bzip2' and 'compressed' if possible. Fix blacklisting and
remove the unpackBzip2() method.
* fwunpack.py: create searchUnpackKnownBzip2() method to short cut unpacking
of files ending in .bz2 extension
* file2package.py: rewrite file2package to an aggregate scan to get rid of
lots of database connections
2015-09-11:
* licenseversion.py: fix license scanning for sqlite databases, make use of
queues for copyright lookup
* licenseversion.py: avoid starting processes unnecessarily
* generatejson.py: use queues instead of pool.map() to avoid making lots of
database connections
* bruteforcescan.py: add stubs for blacklisting files that should not be
scanned, also close database connections that are not needed
* kernelsymbols.py: correctly close pool of workers, do not make database
connections if not needed
2015-09-10:
* fwunpack.py: add unpacking for some MSI files
* licenseversion.py: reduce database connections when using PostgreSQL,
because under certain conditions it is possible to run out of network
connections as sockets still stay open for a little while after closing the
connection to the database.
* generatereports.py: fix reporting for certain packages
* licenseversion.py: combine extractJavaNames and extractVariablesJava to save
on database connections
2015-09-07:
* bruteforcescan.py: several bug fixes and sanity checks related to database
connectivity
2015-09-06:
* bruteforcescan.py: add hooks for methods that can unpack files based on
extension, circumventing most of the 'brute force' scanning approach
* fwunpack.py: add method to scan gzip files based on extension
* bruteforcescan.py: if configured lookup the SHA256 of the file in the BAT
database to see if it is a known source code file. If so, tag it.
* bat-scan.config: ignore files that have been identified as source code files
that are in the BAT database.
2015-09-05:
* unpackrpm.py: fix verification of RPM files if the RPM is embedded into
another file and RPM offset does not start at 0
* fwunpack.py: unpack Windows help files (.chm)
* prerun.py: recognize and tag a few known certificate files so they can be
ignored by other scans
* fwunpack.py: extract and process GIF files in a slightly lazier way, reading
less data and (possibly) reducing memory usage
* fwunpack.py: read less data when extracting PNG files
* fwunpack.py: fix ext2 unpacking (don't try to remove directory that was
never made in the first place)
2015-09-04:
* fwunpack.py/bruteforcescan.py: fix around encoding issues in python-magic
* prerun.py: record more ELF types
* bat-scan/bruteforcescan.py: introduce 'cleanup' for cleaning scanning
directories. This is useful when scanning a directory with lots of files that
all need to be scanned and you don't want the unpacking directory to overflow
with result directories
* fwunpack.py: version checks for LRZIP
* licenseversion.py: first check if there are any files for which there
actually are identifiers before creating database connections and fetching
data from the database
* javacheck.py: also support class file format used in Java SE7 and SE8
* fwunpack.py: fix GIF length check
2015-09-03:
* fwunpack.py: don't carve gzip compressed data out of a file if the deflate
test already uncompressed all data. Instead, write it out to a file directly,
possibly avoiding a lot of I/O.
* fwunpack.py: clean up SWF unpacking, tag successfully unpacked SWF files.
2015-09-02:
* fwunpack.py: sanity check for LZO version number
* fwunpack.py: extra sanity check for gzip by trying to uncompress a block of
deflate data using zlib
* fwunpack.py: don't read data unnecessarily in gzip crc data check, but first
seek to offset
2015-08-30:
* fwunpack.py: add extra sanity checks for gzip (deflate header checks)
* fwunpack.py: add revision check for ext2/3/4 file systems
2015-08-29:
* remove gcc-java (for jcf-dump) as a dependency
* generatejson.py: decode weird characters
* identifier.py: also check DEX_TMPDIR when using PostgreSQL
* prerun.py: avoid calling xmllint for files that are guaranteed to be not XML
files
2015-08-28:
* generatejson.py: include size
* fwunpack.py: fix yaffs2 blacklisting check
2015-08-27:
* generatejson.py: sanity check
* security.py: sanity check
2015-07-27:
* createdb.py: fix index name
* createdb.py: don't read from a closed file
2015-06-24:
* fwunpack.py: fix CPIO blacklisting
2015-06-23:
* fwunpack.py: extra sanity check for LZMA
* fwunpack.py: unpack some YAFFS2 file systems if they occur in the middle of
a file. This does not work perfectly yet.
2015-06-20:
* fwunpack.py: fix and expand CPIO checks
* fwunpack.py: extra sanity check for ar format