-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathprops.conf.spec
1717 lines (1526 loc) · 82.1 KB
/
props.conf.spec
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Version 9.4.0
#
# This file contains possible setting/value pairs for configuring Splunk
# software's processing properties through props.conf.
#
# Props.conf is commonly used for:
#
# * Configuring line breaking for multi-line events.
# * Setting up character set encoding.
# * Allowing processing of binary files.
# * Configuring timestamp recognition.
# * Configuring event segmentation.
# * Overriding automated host and source type matching. You can use
# props.conf to:
# * Configure advanced (regular expression-based) host and source
type overrides.
# * Override source type matching for data from a particular source.
# * Set up rule-based source type recognition.
# * Rename source types.
# * Anonymizing certain types of sensitive incoming data, such as credit
# card or social security numbers, using sed scripts.
# * Routing specific events to a particular index, when you have multiple
# indexes.
# * Creating new index-time field extractions, including header-based field
# extractions.
# NOTE: Do not add to the set of fields that are extracted
# at index time unless it is absolutely necessary because there are
# negative performance implications.
# * Defining new search-time field extractions. You can define basic
# search-time field extractions entirely through props.conf, but a
# transforms.conf component is required if you need to create search-time
# field extractions that involve one or more of the following:
# * Reuse of the same field-extracting regular expression across
# multiple sources, source types, or hosts.
# * Application of more than one regular expression (regex) to the
# same source, source type, or host.
# * Delimiter-based field extractions (they involve field-value pairs
# that are separated by commas, colons, semicolons, bars, or
# something similar).
# * Extraction of multiple values for the same field (multivalued
# field extraction).
# * Extraction of fields with names that begin with numbers or
# underscores.
# * Setting up lookup tables that look up fields from external sources.
# * Creating field aliases.
#
# NOTE: Several of the above actions involve a corresponding transforms.conf
# configuration.
#
# You can find more information on these topics by searching the Splunk
# documentation (http://docs.splunk.com/Documentation/Splunk).
#
# There is a props.conf in $SPLUNK_HOME/etc/system/default/. To set custom
# configurations, place a props.conf in $SPLUNK_HOME/etc/system/local/. For
# help, see props.conf.example.
#
# You can enable configurations changes made to props.conf by typing the
# following search string in Splunk Web:
#
# | extract reload=T
#
# To learn more about configuration files (including precedence) see
# the documentation located at
# http://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles
#
# For more information about using props.conf in conjunction with
# distributed Splunk deployments, see the Distributed Deployment Manual.
# GLOBAL SETTINGS
# Use the [default] stanza to define any global settings.
# * You can also define global settings outside of any stanza, at the top
# of the file.
# * Each conf file should have at most one default stanza. If there are
# multiple default stanzas, settings are combined. In the case of
# multiple definitions of the same setting, the last definition in the
# file wins.
# * If a setting is defined at both the global level and in a specific
# stanza, the value in the specific stanza takes precedence.
[<spec>]
* This stanza enables properties for a given <spec>.
* A props.conf file can contain multiple stanzas for any number of
different <spec>.
* Follow this stanza name with any number of the following setting/value
pairs, as appropriate for what you want to do.
* If you do not set a setting for a given <spec>, the default is used.
<spec> can be:
1. <sourcetype>, the source type of an event.
2. host::<host>, where <host> is the host, or host-matching pattern, for an
event.
3. source::<source>, where <source> is the source, or source-matching
pattern, for an event.
4. rule::<rulename>, where <rulename> is a unique name of a source type
classification rule.
5. delayedrule::<rulename>, where <rulename> is a unique name of a delayed
source type classification rule.
These are only considered as a last resort
before generating a new source type based on the
source seen.
**[<spec>] stanza precedence:**
For settings that are specified in multiple categories of matching [<spec>]
stanzas, [host::<host>] settings override [<sourcetype>] settings.
Additionally, [source::<source>] settings override both [host::<host>]
and [<sourcetype>] settings.
**Considerations for Windows file paths:**
When you specify Windows-based file paths as part of a [source::<source>]
stanza, you must escape any backslashes contained within the specified file
path.
Example: [source::c:\\path_to\\file.txt]
**[<spec>] stanza patterns:**
When setting a [<spec>] stanza, you can use the following regex-type syntax:
... recurses through directories until the match is met
or equivalently, matches any number of characters.
* matches anything but the path separator 0 or more times.
The path separator is '/' on unix, or '\' on Windows.
Intended to match a partial or complete directory or filename.
| is equivalent to 'or'
( ) are used to limit scope of |.
\\ = matches a literal backslash '\'.
Example: [source::....(?<!tar.)(gz|bz2)]
This matches any file ending with '.gz' or '.bz2', provided this is not
preceded by 'tar.', so tar.bz2 and tar.gz would not be matched.
**[source::<source>] and [host::<host>] stanza match language:**
Match expressions must match the entire name, not just a substring. Match
expressions are based on a full implementation of Perl-compatible regular
expressions (PCRE) with the translation of "...", "*", and "." Thus, "."
matches a period, "*" matches non-directory separators, and "..." matches
any number of any characters.
For more information search the Splunk documentation for "specify input
paths with wildcards".
**[<spec>] stanza pattern collisions:**
Suppose the source of a given input matches multiple [source::<source>]
patterns. If the [<spec>] stanzas for these patterns each supply distinct
settings, Splunk software applies all of these settings.
However, suppose two [<spec>] stanzas supply the same setting. In this case,
Splunk software chooses the value to apply based on the ASCII order of the
patterns in question.
For example, take this source:
source::az
and the following colliding patterns:
[source::...a...]
sourcetype = a
[source::...z...]
sourcetype = z
In this case, the settings provided by the pattern [source::...a...] take
precedence over those provided by [source::...z...], and sourcetype ends up
with "a" as its value.
To override this default ASCII ordering, use the priority key:
[source::...a...]
sourcetype = a
priority = 5
[source::...z...]
sourcetype = z
priority = 10
Assigning a higher priority to the second stanza causes sourcetype to have
the value "z".
**Case-sensitivity for [<spec>] stanza matching:**
By default, [source::<source>] and [<sourcetype>] stanzas match in a
case-sensitive manner, while [host::<host>] stanzas match in a
case-insensitive manner. This is a convenient default, given that DNS names
are case-insensitive.
To force a [host::<host>] stanza to match in a case-sensitive manner use the
"(?-i)" option in its pattern.
For example:
[host::foo]
FIELDALIAS-a = a AS one
[host::(?-i)bar]
FIELDALIAS-b = b AS two
The first stanza actually applies to events with host values of "FOO" or
"Foo" . The second stanza, on the other hand, does not apply to events with
host values of "BAR" or "Bar".
**Building the final [<spec>] stanza:**
The final [<spec>] stanza is built by layering together (1) literal-matching
stanzas (stanzas which match the string literally) and (2) any
regex-matching stanzas, according to the value of the priority field.
If not specified, the default value of the priority key is:
* 0 for pattern-matching stanzas.
* 100 for literal-matching stanzas.
NOTE: Setting the priority key to a value greater than 100 causes the
pattern-matched [<spec>] stanzas to override the values of the
literal-matching [<spec>] stanzas.
The priority key can also be used to resolve collisions
between [<sourcetype>] patterns and [host::<host>] patterns. However, be aware
that the priority key does *not* affect precedence across <spec> types. For
example, [<spec>] stanzas with [source::<source>] patterns take priority over
stanzas with [host::<host>] and [<sourcetype>] patterns, regardless of their
respective priority key values.
#******************************************************************************
# The possible setting/value pairs for props.conf, and their
# default values, are:
#******************************************************************************
priority = <number>
* Overrides the default ASCII ordering of matching stanza names
# International characters and character encoding.
CHARSET = <string>
* When set, Splunk software assumes the input from the given [<spec>] is in
the specified encoding.
* Can only be used as the basis of [<sourcetype>] or [source::<spec>],
not [host::<spec>].
* A list of valid encodings can be retrieved using the command "iconv -l" on
most *nix systems.
* If an invalid encoding is specified, a warning is logged during initial
configuration and further input from that [<spec>] is discarded.
* If the source encoding is valid, but some characters from the [<spec>] are
not valid in the specified encoding, then the characters are escaped as
hex (for example, "\xF3").
* When set to "AUTO", Splunk software attempts to automatically determine the
character encoding and convert text from that encoding to UTF-8.
* For a complete list of the character sets Splunk software automatically
detects, see the online documentation.
* This setting applies at input time, when data is first read by Splunk
software, such as on a forwarder that has configured inputs acquiring the
data.
* Default (on Windows machines): AUTO
* Default (otherwise): UTF-8
#******************************************************************************
# Line breaking
#******************************************************************************
# Use the following settings to define the length of a line.
TRUNCATE = <non-negative integer>
* The default maximum line length, in bytes.
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
a sign of garbage data).
* Default: 10000
LINE_BREAKER = <regular expression>
* Specifies a regex that determines how the raw text stream is broken into
initial events, before line merging takes place. (See the SHOULD_LINEMERGE
setting, below.)
* The regex must contain a capturing group -- a pair of parentheses which
defines an identified subcomponent of the match.
* Wherever the regex matches, Splunk software considers the start of the first
capturing group to be the end of the previous event, and considers the end
of the first capturing group to be the start of the next event.
* The contents of the first capturing group are discarded, and are not
present in any event. You are telling Splunk software that this text comes
between lines.
* NOTE: You get a significant boost to processing speed when you use
LINE_BREAKER to delimit multi-line events (as opposed to using
SHOULD_LINEMERGE to reassemble individual lines into multi-line events).
* When using LINE_BREAKER to delimit events, SHOULD_LINEMERGE should be set
to false, to ensure no further combination of delimited events occurs.
* Using LINE_BREAKER to delimit events is discussed in more detail in the
documentation. Search the documentation for "configure event line breaking"
for details.
* Default: ([\r\n]+) (Data is broken into an event for each line,
delimited by any number of carriage return or newline characters.)
** Special considerations for LINE_BREAKER with branched expressions **
When using LINE_BREAKER with completely independent patterns separated by
pipes, some special issues come into play.
EG. LINE_BREAKER = pattern1|pattern2|pattern3
NOTE: This is not about all forms of alternation. For instance, there is
nothing particularly special about
example: LINE_BREAKER = ([\r\n])+(one|two|three)
where the top level remains a single expression.
CAUTION: Relying on these rules is NOT encouraged. Simpler is better, in
both regular expressions and the complexity of the behavior they rely on.
If possible, reconstruct your regex to have a leftmost capturing group
that always matches.
It might be useful to use non-capturing groups if you need to express a group
before the text to discard.
Example: LINE_BREAKER = (?:one|two)([\r\n]+)
* This matches the text one, or two, followed by any amount of
newlines or carriage returns. The one-or-two group is non-capturing
via the ?: prefix and is skipped by LINE_BREAKER.
* A branched expression can match without the first capturing group
matching, so the line breaker behavior becomes more complex.
Rules:
1: If the first capturing group is part of a match, it is considered the
linebreak, as normal.
2: If the first capturing group is not part of a match, the leftmost
capturing group which is part of a match is considered the linebreak.
3: If no capturing group is part of the match, the linebreaker assumes
that the linebreak is a zero-length break immediately preceding the match.
Example 1: LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3
* A line ending with 'end' followed a line beginning with 'begin' would
match the first branch, and the first capturing group would have a match
according to rule 1. That particular newline would become a break
between lines.
* A line ending with 'end2' followed by a line beginning with 'begin2'
would match the second branch and the second capturing group would have
a match. That second capturing group would become the linebreak
according to rule 2, and the associated newline would become a break
between lines.
* The text 'begin3' anywhere in the file at all would match the third
branch, and there would be no capturing group with a match. A linebreak
would be assumed immediately prior to the text 'begin3' so a linebreak
would be inserted prior to this text in accordance with rule 3. This
means that a linebreak occurs before the text 'begin3' at any
point in the text, whether a linebreak character exists or not.
Example 2: Example 1 would probably be better written as follows. This is
not equivalent for all possible files, but for most real files
would be equivalent.
LINE_BREAKER = end2?(\n)begin(2|3)?
LINE_BREAKER_LOOKBEHIND = <integer>
* The number of bytes before the end of the raw data chunk
to which Splunk software should apply the 'LINE_BREAKER' regex.
* When there is leftover data from a previous raw chunk,
LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
the raw chunk (with the next chunk concatenated) where Splunk software
applies the LINE_BREAKER regex.
* You might want to increase this value from its default if you are
dealing with especially large or multi-line events.
* Default: 100
# Use the following settings to specify how multi-line events are handled.
SHOULD_LINEMERGE = <boolean>
* Whether or not to combine several lines of data into a single
multiline event, based on the configuration settings listed in
this subsection.
* When you set this to "true", Splunk software combines several lines of data
into a single multi-line event, based on values you configure
in the following settings.
* When you set this to "false", Splunk software does not combine lines of
data into multiline events.
* Default: true
# When SHOULD_LINEMERGE is set to true, use the following settings to
# define how Splunk software builds multi-line events.
BREAK_ONLY_BEFORE_DATE = <boolean>
* Whether or not to create a new event if a new line with a date is encountered
in the data stream.
* When you set this to "true", Splunk software creates a new event only if it
encounters a new line with a date.
* NOTE: When using DATETIME_CONFIG = CURRENT or NONE, this setting is not
meaningful, as timestamps are not identified.
* Default: true
BREAK_ONLY_BEFORE = <regular expression>
* When set, Splunk software creates a new event only if it encounters a new
line that matches the regular expression.
* Default: empty string
MUST_BREAK_AFTER = <regular expression>
* When set, Splunk software creates a new event for the next input line only
if the regular expression matches the current line.
* It is possible for the software to break before the current line if
another rule matches.
* Default: empty string
MUST_NOT_BREAK_AFTER = <regular expression>
* When set, and the current line matches the regular expression, Splunk software
does not break on any subsequent lines until the MUST_BREAK_AFTER expression
matches.
* Default: empty string
MUST_NOT_BREAK_BEFORE = <regular expression>
* When set, and the current line matches the regular expression, Splunk
software does not break the last event before the current line.
* Default: empty string
MAX_EVENTS = <integer>
* The maximum number of input lines to add to any event.
* Splunk software breaks after it reads the specified number of lines.
* Default: 256
MAX_EXPECTED_EVENT_LINES = <integer>
* The number of expected input lines per event, on average.
* Splunk software optimizes memory allocation for this number of lines.
* Do not change this setting without contacting Splunk Support.
* Default: 7
ROUTE_EVENTS_OLDER_THAN = <non-negative integer>[s|m|h|d]
* If set, AggregatorProcessor routes events older than 'ROUTE_EVENTS_OLDER_THAN'
to nullQueue after timestamp extraction.
* Default: no default
# Use the following settings to handle better load balancing from UF.
# NOTE: The EVENT_BREAKER properties are applicable for Splunk Universal
# Forwarder instances only.
EVENT_BREAKER_ENABLE = <boolean>
* Whether or not a universal forwarder (UF) uses the 'ChunkedLBProcessor'
data processor to improve distribution of events to receiving
indexers for a given source type.
* When set to true, a UF splits incoming data with a
light-weight chunked line breaking processor ('ChunkedLBProcessor')
so that data is distributed fairly evenly amongst multiple indexers.
* When set to false, a UF uses standard load-balancing methods to
send events to indexers.
* Use this setting on a UF to indicate that data
should be split on event boundaries across indexers, especially
for large files.
* This setting is only valid on universal forwarder instances.
* Default: false
# Use the following to define event boundaries for multi-line events
# For single-line events, the default settings should suffice
EVENT_BREAKER = <regular expression>
* A regular expression that specifies the event boundary for a
universal forwarder to use to determine when it can send events
to an indexer.
* The regular expression must contain a capturing group
(a pair of parentheses that defines an identified sub-component
of the match.)
* When the UF finds a match, it considers the first capturing group
to be the end of the previous event, and the end of the capturing group
to be the beginning of the next event.
* At this point, the forwarder can then change the receiving indexer
based on these event boundaries.
* This setting is only active if you set 'EVENT_BREAKER_ENABLE' to
"true", only works on universal forwarders, and
works best with multiline events.
* Default: "([\r\n]+)"
LB_CHUNK_BREAKER = <regular expression>
* DEPRECATED. Use 'EVENT_BREAKER' instead.
* A regular expression that specifies the event boundary for a
universal forwarder to use to determine when it can send events
to an indexer.
* The regular expression must contain a capturing group
(a pair of parentheses that defines an identified sub-component
of the match.)
* When the UF finds a match, it considers the first capturing group
to be the end of the previous event, and the end of the capturing group
to be the beginning of the next event.
* Splunk software discards the contents of the first capturing group.
This content will not be present in any event, as Splunk software
considers this text to come between lines.
* At this point, the forwarder can then change the receiving indexer
based on these event boundaries.
* This is only used if [httpout] is configured in outputs.conf
* Default: ([\r\n]+)
LB_CHUNK_BREAKER_TRUNCATE = <non-negative integer>
* The maximum length, in bytes, of a chunk of data that a forwarder
sends over HTTP.
* Although this is a byte value, the forwarder rounds down the length
when this would otherwise land mid-character for multi-byte characters.
* This setting is valid only if you configure an [httpout] stanza in the
outputs.conf configuration file.
* Default: 2000000
#******************************************************************************
# Timestamp extraction configuration
#******************************************************************************
DATETIME_CONFIG = [<filename relative to $SPLUNK_HOME> | CURRENT | NONE]
* Specifies which file configures the timestamp extractor, which identifies
timestamps from the event text.
* This setting may also be set to "NONE" to prevent the timestamp
extractor from running or "CURRENT" to assign the current system time to
each event.
* "CURRENT" sets the time of the event to the time that the event was
merged from lines, or worded differently, the time it passed through the
aggregator processor.
* "NONE" leaves the event time set to whatever time was selected by
the input layer
* For data sent by Splunk forwarders over the Splunk-to-Splunk protocol,
the input layer is the time that was selected on the forwarder by
its input behavior (as below).
* For file-based inputs (monitor, batch) the time chosen is the
modification timestamp on the file being read.
* For other inputs, the time chosen is the current system time when
the event is read from the pipe/socket/etc.
* Both "CURRENT" and "NONE" explicitly disable the per-text timestamp
identification, so the default event boundary detection
(BREAK_ONLY_BEFORE_DATE = true) is likely to not work as desired. When
using these settings, use 'SHOULD_LINEMERGE' and/or the 'BREAK_ONLY_*' ,
'MUST_BREAK_*' settings to control event merging.
* For more information on 'DATETIME_CONFIG' and datetime.xml, see "Configure
advanced timestamp recognition with datetime.xml" in the Splunk Documentation.
* Default: /etc/datetime.xml (for example, $SPLUNK_HOME/etc/datetime.xml).
TIME_PREFIX = <regular expression>
* If set, Splunk software scans the event text for a match for this regex
in event text before attempting to extract a timestamp.
* The timestamping algorithm only looks for a timestamp in the text
following the end of the first regex match.
* For example, if 'TIME_PREFIX' is set to "abc123", only text following the
first occurrence of the text abc123 is used for timestamp extraction.
* If the 'TIME_PREFIX' cannot be found in the event text, timestamp extraction
does not occur.
* Default: empty string
MAX_TIMESTAMP_LOOKAHEAD = <integer>
* The number of characters into an event Splunk software should look
for a timestamp.
* This constraint to timestamp extraction is applied from the point of the
'TIME_PREFIX'-set location.
* For example, if 'TIME_PREFIX' positions a location 11 characters into the
event, and MAX_TIMESTAMP_LOOKAHEAD is set to 10, timestamp extraction is
constrained to characters 11 through 20.
* If set to 0 or -1, the length constraint for timestamp recognition is
effectively disabled. This can have negative performance implications
which scale with the length of input lines (or with event size when
'LINE_BREAKER' is redefined for event splitting).
* Default: 128
TIME_FORMAT = <strptime-style format>
* Specifies a "strptime" format string to extract the date.
* "strptime" is an industry standard for designating time formats.
* For more information on strptime, see "Configure timestamp recognition" in
the online documentation.
* TIME_FORMAT starts reading after the TIME_PREFIX. If both are specified,
the TIME_PREFIX regex must match up to and including the character before
the TIME_FORMAT date.
* For good results, the <strptime-style format> should describe the day of
the year and the time of day.
* Default: empty string
DETERMINE_TIMESTAMP_DATE_WITH_SYSTEM_TIME = <boolean>
* Whether or not the Splunk platform uses the current system time to
determine the date of an event timestamp that has no date.
* If set to "true", the platform uses the system time to determine the
date for an event that has a timestamp without a date.
* If the future event has a timestamp that is less than three hours
later than the current system time, then the platform presumes
that the timestamp date for that event is the current date.
* Otherwise, it presumes that the timestamp date is in the future, and
uses the previous day's date instead.
* If set to "false", the platform uses the last successfully-parsed
timestamp to determine the timestamp date for the event.
* Default: false
TZ = <timezone identifier>
* The algorithm for determining the time zone for a particular event is as
follows:
* If the event has a timezone in its raw text (for example, UTC, -08:00),
use that.
* If TZ is set to a valid timezone string, use that.
* If the event was forwarded, and the forwarder-indexer connection uses
the version 6.0 and higher forwarding protocol, use the timezone provided
by the forwarder.
* Otherwise, use the timezone of the system that is running splunkd.
* Default: empty string
TZ_ALIAS = <key=value>[,<key=value>]...
* Provides Splunk software admin-level control over how timezone strings
extracted from events are interpreted.
* For example, EST can mean Eastern (US) Standard time, or Eastern
(Australian) Standard time. There are many other three letter timezone
acronyms with many expansions.
* There is no requirement to use 'TZ_ALIAS' if the traditional Splunk software
default mappings for these values have been as expected. For example, EST
maps to the Eastern US by default.
* Has no effect on the 'TZ' value. This only affects timezone strings from event
text, either from any configured 'TIME_FORMAT', or from pattern-based guess
fallback.
* The setting is a list of key=value pairs, separated by commas.
* The key is matched against the text of the timezone specifier of the
event, and the value is the timezone specifier to use when mapping the
timestamp to UTC/GMT.
* The value is another TZ specifier which expresses the desired offset.
* Example: TZ_ALIAS = EST=GMT+10:00 (See props.conf.example for more/full
examples)
* Default: not set
MAX_DAYS_AGO = <integer>
* The maximum number of days in the past, from the current date as
provided by the input layer (For example forwarder current time, or modtime
for files), that an extracted date can be valid.
* Splunk software still indexes events with dates older than 'MAX_DAYS_AGO'
with the timestamp of the last acceptable event.
* If no such acceptable event exists, new events with timestamps older
than 'MAX_DAYS_AGO' uses the current timestamp.
* For example, if MAX_DAYS_AGO = 10, Splunk software applies the timestamp
of the last acceptable event to events with extracted timestamps older
than 10 days in the past. If no acceptable event exists, Splunk software
applies the current timestamp.
* If your data is older than 2000 days, increase this setting.
* Highest legal value: 10951 (30 years).
* Default: 2000 (5.48 years).
MAX_DAYS_HENCE = <integer>
* The maximum number of days in the future, from the current date as
provided by the input layer(For e.g. forwarder current time, or
modtime for files), that an extracted date can be valid.
* Splunk software still indexes events with dates more than 'MAX_DAYS_HENCE'
in the future with the timestamp of the last acceptable event.
* If no such acceptable event exists, new events
with timestamps after 'MAX_DAYS_HENCE' use the current timestamp.
* For example, if MAX_DAYS_HENCE = 3, Splunk software applies the timestamp of
the last acceptable event to events with extracted timestamps more than 3
days in the future. If no acceptable event exists, Splunk software applies
the current timestamp.
* The default value includes dates from one day in the future.
* If your servers have the wrong date set or are in a timezone that is one
day ahead, increase this value to at least 3.
* NOTE: False positives are less likely with a smaller window. Change with
caution.
* Highest legal value: 10950 (30 years).
* Default: 2
MAX_DIFF_SECS_AGO = <integer>
* This setting prevents Splunk software from rejecting events with timestamps
that are out of order.
* Do not use this setting to filter events. Splunk software uses
complicated heuristics for time parsing.
* Splunk software warns you if an event timestamp is more than
'MAX_DIFF_SECS_AGO' seconds BEFORE the previous timestamp and does not
have the same time format as the majority of timestamps from the source.
* After Splunk software throws the warning, it only rejects an event if it
cannot apply a timestamp to the event. (For example, if Splunk software
cannot recognize the time of the event.)
* If your timestamps are wildly out of order, consider increasing
this value.
* NOTE: If the events contain time but not date (date determined another way,
such as from a filename) this check only considers the hour. (No one
second granularity for this purpose.)
* Highest legal value: 2147483646 (68.1 years).
* Defaults: 3600 (one hour).
MAX_DIFF_SECS_HENCE = <integer>
* This setting prevents Splunk software from rejecting events with timestamps
that are out of order.
* Do not use this setting to filter events. Splunk software uses
complicated heuristics for time parsing.
* Splunk software warns you if an event timestamp is more than
'MAX_DIFF_SECS_HENCE' seconds AFTER the previous timestamp and does not
have the same time format as the majority of timestamps from the source.
* After Splunk software throws the warning, it only rejects an event if it
cannot apply a timestamp to the event. (For example, if Splunk software
cannot recognize the time of the event.)
* If your timestamps are wildly out of order, or you have logs that
are written less than once a week, consider increasing this value.
* Highest legal value: 2147483646 (68.1 years).
* Default: 604800 (one week).
ADD_EXTRA_TIME_FIELDS = [none | subseconds | all | <boolean>]
* Whether or not Splunk software automatically generates and indexes the
following keys with events:
* date_hour, date_mday, date_minute, date_month, date_second, date_wday,
date_year, date_zone, timestartpos, timeendpos, timestamp.
* These fields are never required, and may be turned off as desired.
* If set to "none" (or false), all indextime data about the timestamp is
stripped out. This removes the above fields but also removes information
about the sub-second timestamp granularity. When events are searched,
only the second-granularity timestamp is returned as part of the
"_time" field.
* If set to "subseconds", the above fields are stripped out but the data about
subsecond timestamp granularity is left intact.
* If set to "all" (or true), all of the indextime fields from the time
parser are included.
* Default: true (Enabled for most data sources.)
#******************************************************************************
# Structured Data Header Extraction and configuration
#******************************************************************************
* This setting applies at input time, when data is first read by Splunk
software, such as on a forwarder that has configured inputs acquiring the
data.
# These special string delimiters, which are single ASCII characters,
# can be used in the settings that follow, which state
# "You can use the delimiters for structured data header extraction with
# this setting."
#
# You can only use a single delimiter for any setting.
# It is not possible to configure multiple delimiters or characters per
# setting.
#
# Example of using the delimiters:
#
# FIELD_DELIMITER=space
# * Tells Splunk software to use the space character to separate fields
# in the specified source.
# space - Space separator (separates on a single space)
# tab / \t - Tab separator
# fs - ASCII file separator
# gs - ASCII group separator
# rs - ASCII record separator
# us - ASCII unit separator
#\xHH - HH is two heaxadecimal digits to use as a separator
Example : \x14 - select 0x14 as delimiter
# none - (Valid for FIELD_QUOTE and HEADER_FIELD_QUOTE only)
null termination character separator
# whitespace / ws - (Valid for FIELD_DELIMITER and
HEADER_FIELD_DELIMITER only)
treats any number of spaces and tabs as a
single delimiter
INDEXED_EXTRACTIONS = <CSV|TSV|PSV|W3C|JSON|HEC>
* The type of file that Splunk software should expect for a given source
type, and the extraction and/or parsing method that should be used on the
file.
* The following values are valid for 'INDEXED_EXTRACTIONS':
CSV - Comma separated value format
TSV - Tab-separated value format
PSV - pipe ("|")-separated value format
W3C - World Wide Web Consortium (W3C) Extended Log File Format
JSON - JavaScript Object Notation format
HEC - Interpret file as a stream of JSON events in the same format as the
HTTP Event Collector (HEC) input.
* These settings change the defaults for other settings in this subsection
to appropriate values, specifically for these formats.
* The HEC format lets events overide many details on a per-event basis, such
as the destination index. Use this value to read data which you know to be
well-formatted and safe to index with little or no processing, such as
data generated by locally written tools.
* When 'INDEXED_EXTRACTIONS = JSON' for a particular source type, do not also
set 'KV_MODE = json' for that source type. This causes the Splunk software to
extract the JSON fields twice: once at index time, and again at search time.
* Default: not set
METRICS_PROTOCOL = <STATSD|COLLECTD_HTTP>
* Which protocol the incoming metric data is using:
STATSD: Supports the statsd protocol, in the following format:
<metric name>:<value>|<metric type>
Use the 'STATSD-DIM-TRANSFORMS' setting to manually extract
dimensions for the above format. Splunk software auto-extracts
dimensions when the data has "#" as dimension delimiter
as shown below:
<metric name>:<value>|<metric type>|#<dim1>:<val1>,
<dim2>:<val2>...
COLLECTD_HTTP: This is data from the write_http collectd plugin being parsed
as streaming JSON docs with the _value living in "values" array
and the dimension names in "dsnames" and the metric type
(for example, counter vs gauge) is derived from "dstypes".
* Default (for event (non-metric) data): not set
STATSD-DIM-TRANSFORMS = <statsd_dim_stanza_name1>,<statsd_dim_stanza_name2>..
* Valid only when 'METRICS_PROTOCOL' is set to "statsd".
* A comma separated list of transforms stanza names which are used to extract
dimensions from statsd metric data.
* Optional for sourcetypes which have only one transforms stanza for extracting
dimensions, and the stanza name is the same as that of sourcetype name.
* Stanza names must start with prefix "statsd-dims:"
For example, in props.conf:
STATSD-DIM-TRANSFORMS = statsd-dims:extract_ip
In transforms.conf, stanza should be prefixed also as so:
[statsd-dims:extract_ip]
* Default: not set
STATSD_EMIT_SINGLE_MEASUREMENT_FORMAT = <boolean>
* Valid only when 'METRICS_PROTOCOL' is set to 'statsd'.
* This setting controls the metric data point format emitted by the statsd
processor.
* When set to true, the statsd processor produces metric data points in
single-measurement format. This format allows only one metric measurement per
data point, with one key-value pair for the metric name
(metric_name=<metric_name>) and another key-value pair for the measurement
value (_value=<numerical_value>).
* When set to false, the statsd processor produces metric data points in
multiple-measurement format. This format allows multiple metric measurements
per data point, where each metric measurement follows this syntax:
metric_name:<metric_name>=<numerical_value>
* We recommend you set this to 'true' for statsd data, because the statsd data
format is single-measurement per data point. This practice enables you to use
downstream transforms to edit the metric_name if necessary. Multiple-value
metric data points are harder to process with downstream transforms.
* Default: true
METRIC-SCHEMA-TRANSFORMS = <metric-schema:stanza_name>[,<metric-schema:stanza_name>]...
* A comma-separated list of metric-schema stanza names from transforms.conf
that the Splunk platform uses to create multiple metrics from index-time
field extractions of a single log event.
* NOTE: This setting is valid only for index-time field extractions.
You can set up the TRANSFORMS field extraction configuration to create
index-time field extractions. The Splunk platform always applies
METRIC-SCHEMA-TRANSFORMS after index-time field extraction takes place.
* Optional.
* Default: empty
PREAMBLE_REGEX = <regex>
* A regular expression that lets Splunk software ignore "preamble lines",
or lines that occur before lines that represent structured data.
* When set, Splunk software ignores these preamble lines,
based on the pattern you specify.
* Default: not set
FIELD_HEADER_REGEX = <regex>
* A regular expression that specifies a pattern for prefixed headers.
* The actual header starts after the pattern. It is not included in
the header field.
* This setting supports the use of the special characters described above.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
HEADER_FIELD_LINE_NUMBER = <integer>
* The line number of the line within the specified file or source that
contains the header fields.
* If set to 0, Splunk software attempts to
locate the header fields within the file automatically.
* Default: 0
FIELD_DELIMITER = <character>
* Which character delimits or separates fields in the
specified file or source.
* You can use the delimiters for structured data header extraction with
this setting.
* This setting supports the use of the special characters described above.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
HEADER_FIELD_DELIMITER = <character>
* Which character delimits or separates header fields in
the specified file or source.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
HEADER_FIELD_ACCEPTABLE_SPECIAL_CHARACTERS = <string>
* This setting specifies the special characters that are allowed in header
fields.
* When this setting is not set, the processor replaces all characters in header
field names that are neither alphanumeric or a space (" ") with underscores.
* For example, if you import a CSV file, and one of the header field names is
"field.name", the processor replaces "field.name" with "field_name", and
imports the field this way.
* If you configure this setting, the processor does not perform a character
replacement in header field names if the special character it encounters
matches one that you specify in the setting value.
* For example, if you configure this setting to ".", the processor does not
replace the "." characters in header field names with underscores.
* This setting only supports characters with ASCII codes below 128.
* CAUTION: Certain special characters can cause the Splunk instance to
malfunction.
* For example, the field name "fieldname=a" is currently sanitized to
"fieldname_a" and the search query "fieldname_a=val" works fine. If the
setting is set to "=" and the field name "fieldname=a" is allowed, it could
result in an invalid-syntax search query "fieldname=a=val".
* Default: empty string
FIELD_QUOTE = <character>
* The character to use for quotes in the specified file
or source.
* You can use the delimiters for structured data header extraction with
this setting.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
HEADER_FIELD_QUOTE = <character>
* The character to use for quotes in the header of the
specified file or source.
* You can use the delimiters for structured data header extraction with
this setting.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
TIMESTAMP_FIELDS = [ <string>,..., <string>]
* Some CSV and structured files have their timestamp encompass multiple
fields in the event separated by delimiters.
* This setting tells Splunk software to specify all such fields which
constitute the timestamp in a comma-separated fashion.
* If not specified, Splunk software tries to automatically extract the
timestamp of the event.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
FIELD_NAMES = [ <string>,..., <string>]
* Some CSV and structured files might have missing headers.
* This setting tells Splunk software to specify the header field names directly.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
MISSING_VALUE_REGEX = <regex>
* The placeholder to use in events where no value is present.
* The default can vary if 'INDEXED_EXTRACTIONS' is set.
* Default (if 'INDEXED_EXTRACTIONS' is not set): not set
JSON_TRIM_BRACES_IN_ARRAY_NAMES = <boolean>
* Whether or not the JSON parser for 'INDEXED_EXTRACTIONS' strips curly
braces from names of fields that are defined as arrays in JSON events.
* When the JSON parser extracts fields from JSON events, by default, it
extracts array field names with the curly braces that indicate they
are arrays ("{}") intact.
* For example, given the following partial JSON event:
{"datetime":"08-20-2015 10:32:25.267 -0700","log_level":"INFO",...,
data:{...,"fs_type":"ext4","mount_point":["/disk48","/disk22"],...}}
Because the "mount_point" field in this event is an array of two
values ("/disk48" and "/disk22"), the JSON parser sees the field as an
array, and extracts it as such, including the braces that identify
it as an array. The resulting field name is "data.mount_point{}").
* Set 'JSON_TRIM_BRACES_IN_ARRAY_NAMES' to "true" if you want the JSON
parser to strip these curly braces from array field names. (In this
example, the resulting field is instead "data.mount_point").
* CAUTION: Setting this to "true" makes array field names that are extracted
at index time through the JSON parser inconsistent with search-time
extraction of array field names through the 'spath' search command.
* Default: false
#******************************************************************************
# Field extraction configuration
#******************************************************************************
NOTE: If this is your first time configuring field extractions in
props.conf, review the following information first. Additional
information is also available in the Getting Data In Manual
in the Splunk Documentation.
There are three different "field extraction types" that you can use to
configure field extractions: TRANSFORMS, REPORT, and EXTRACT. They differ in
two significant ways: 1) whether they create indexed fields (fields
extracted at index time) or extracted fields (fields extracted at search
time), and 2), whether they include a reference to an additional component
called a "field transform," which you define separately in transforms.conf.
**Field extraction configuration: index time versus search time**
Use the TRANSFORMS field extraction type to create index-time field
extractions. Use the REPORT or EXTRACT field extraction types to create
search-time field extractions.
NOTE: Index-time field extractions have performance implications.
Create additions to the default set of indexed fields ONLY
in specific circumstances. Whenever possible, extract
fields only at search time.
There are times when you may find that you need to change or add to your set
of indexed fields. For example, you may have situations where certain
search-time field extractions are noticeably impacting search performance.
This can happen when the value of a search-time extracted field exists
outside of the field more often than not. For example, if you commonly
search a large event set with the expression company_id=1 but the value 1
occurs in many events that do *not* have company_id=1, you may want to add
company_id to the list of fields extracted by Splunk software at index time.
This is because at search time, Splunk software checks each
instance of the value 1 to see if it matches company_id, and that kind of
thing slows down performance when you have Splunk searching a large set of
data.
Conversely, if you commonly search a large event set with expressions like
company_id!=1 or NOT company_id=1, and the field company_id nearly *always*
takes on the value 1, you may want to add company_id to the list of fields
extracted by Splunk software at index time.
For more information about index-time field extraction, search the
documentation for "index-time extraction." For more information about
search-time field extraction, search the documentation for
"search-time extraction."
**Field extraction configuration: field transforms vs. "inline" (props.conf only) configs**
The TRANSFORMS and REPORT field extraction types reference an additional
component called a field transform, which you define separately in
transforms.conf. Field transforms contain a field-extracting regular
expression and other settings that govern the way that the transform
extracts fields. Field transforms are always created in conjunction with
field extraction stanzas in props.conf; they do not stand alone.
The EXTRACT field extraction type is considered to be "inline," which means
that it does not reference a field transform. It contains the regular
expression that Splunk software uses to extract fields at search time. You
can use EXTRACT to define a field extraction entirely within props.conf, no