Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TripPy to GoBot #1505

Open
wants to merge 158 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
86dc144
wip implement masking for formfilling
oserikov Sep 1, 2020
cfed8d4
wip implement masking for choosing only informative actions
oserikov Oct 1, 2020
3052da6
Merge remote-tracking branch 'origin/master' into feature/gobot_naive…
oserikov Oct 5, 2020
80cb0a6
wip implement masking for choosing only informative actions
oserikov Oct 5, 2020
637eea8
wip implement masking for choosing only informative actions
oserikov Oct 6, 2020
7ee94f5
wip implement masking for choosing only informative actions
oserikov Oct 6, 2020
9a77ca8
wip formfilling, added some todos
oserikov Oct 14, 2020
03c670b
now rasa config reader parses forms
oserikov Oct 19, 2020
9983642
added from_yaml reader for domain knowledge
oserikov Oct 19, 2020
0a3d493
added from_yaml reader for domain knowledge, added type hints for dom…
oserikov Oct 19, 2020
f9e28ba
added reading slots mapping from domain.yml
oserikov Oct 19, 2020
e0ae6be
added augment stories with get_slot_calls.
oserikov Oct 22, 2020
2a1536e
added proper form name parsing in story md
oserikov Oct 22, 2020
317cc39
fix: typo
oserikov Oct 22, 2020
3c3949d
fix: typo
oserikov Oct 22, 2020
c4c99d9
fix: typo
oserikov Oct 22, 2020
91dc937
fix: typo
oserikov Oct 22, 2020
b61fff8
fix: typo
oserikov Oct 22, 2020
72f8aee
fix: typo
oserikov Oct 22, 2020
41100ff
fix: typo
oserikov Oct 22, 2020
87b6b59
fix: typo
oserikov Oct 22, 2020
b52a558
merge deeppavlov/dataset_readers/md_yaml_dialogs_reader from master
oserikov Nov 20, 2020
3b2d5b5
Merge remote-tracking branch 'origin/master' into feature/gobot_naive…
oserikov Nov 20, 2020
75866b1
fix: append -> extend
oserikov Nov 20, 2020
0921bbf
wip forms debugging
oserikov Nov 26, 2020
e68e912
fix: forms augmentation was poorly handled
oserikov Nov 26, 2020
65e4b28
fix: do not load formfilling info when no rasa formfilling data provided
oserikov Nov 27, 2020
2d7896b
fix: added field initialization in constructor
oserikov Nov 30, 2020
729ff04
Merge branch 'feature/gobot_naive_formfilling' of https://github.com/…
oserikov Nov 30, 2020
b2ef5c7
added docstrings and type hints
oserikov Nov 30, 2020
af0d8e8
fix: add newline to the end of the file
oserikov Nov 30, 2020
8e2954d
added docstrings and type hints
oserikov Nov 30, 2020
a4bdb73
added docstrings and type hints
oserikov Nov 30, 2020
5fabd31
added docstrings and type hints
oserikov Nov 30, 2020
df5b8d8
added docstrings and type hints
oserikov Nov 30, 2020
7a00be1
added docstrings and type hints
oserikov Nov 30, 2020
5960342
added docstrings and type hints
oserikov Nov 30, 2020
13e6d7b
added docstrings and type hints
oserikov Nov 30, 2020
954f140
added docstrings and type hints
oserikov Nov 30, 2020
44cf9ae
fix typehint typo
oserikov Nov 30, 2020
e625c09
fix: google style docstring
oserikov Dec 10, 2020
3d43f9e
remove nomoreneeded comment
oserikov Dec 10, 2020
6355b56
fix: add typehint for returned objects
oserikov Dec 10, 2020
445d53d
fix: remove redundant type checks, add typehit
oserikov Dec 10, 2020
bf886c5
fix: remove unused import
oserikov Dec 10, 2020
5f10d5d
fix: remove commented code
oserikov Dec 10, 2020
a5729d3
remove unused error object
oserikov Dec 10, 2020
89a4f41
remove redundant comment
oserikov Dec 10, 2020
60d3218
wip: rulebased gobot system
oserikov Dec 18, 2020
f9aa2b0
wip rulebased gobot
oserikov Dec 19, 2020
a9ea704
wip rulebased gobot
oserikov Dec 20, 2020
b3bf2fa
wip rulebased gobot
oserikov Dec 20, 2020
467ed7f
wip rulebased gobot
oserikov Dec 20, 2020
cc3d147
wip rulebased gobot
oserikov Dec 20, 2020
719b6d3
wip rulebased gobot
oserikov Dec 20, 2020
a2d3f7b
revert: rulebased gobot system
oserikov Dec 24, 2020
e2d2a59
fix dropout err
oserikov Dec 24, 2020
740a897
Update dialogue_state_tracker.py
oserikov Dec 26, 2020
93ec879
Update md_yaml_dialogs_reader.py
oserikov Dec 27, 2020
8d6ad7f
Update md_yaml_dialogs_reader.py
oserikov Dec 27, 2020
cfcd55c
reintroduced rulebased gobot system
oserikov Dec 27, 2020
804736d
Update featurized_tracker.py
oserikov Dec 27, 2020
70bfa0b
Update go_bot.py
oserikov Dec 27, 2020
9eebf0f
Update dialogue_state_tracker.py
oserikov Jan 13, 2021
d84e882
Update dialogue_state_tracker.py
oserikov Jan 13, 2021
7151697
wip move data generation from intent catcher model to the separate it…
oserikov May 7, 2021
5b77127
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 7, 2021
29197ce
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 7, 2021
9cb19d0
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 10, 2021
c307262
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 10, 2021
27fade1
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 10, 2021
61c2938
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 10, 2021
6f35644
wip unify md_yaml_reader and intent_catcher_reader
oserikov May 13, 2021
e59d534
wip codegen for openapi integration
oserikov May 14, 2021
a551f4e
wip codegen for openapi integration
oserikov May 14, 2021
1ef1f9c
wip agent intents and slotfilling
oserikov May 14, 2021
f4b4a3b
wip agent intents and slotfilling
oserikov May 14, 2021
7f93f01
wip agent intents and slotfilling
oserikov May 14, 2021
bb82364
wip agent intents and slotfilling
oserikov May 14, 2021
6816bd9
wip agent intents and slotfilling
oserikov May 14, 2021
a17933b
wip agent intents and slotfilling
oserikov Jun 2, 2021
b4d293f
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 2, 2021
2eb540d
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 2, 2021
90fa560
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 3, 2021
8c63c4e
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 5, 2021
b10107c
Merge branch 'ic_reader' into agent_DST2
oserikov Jun 5, 2021
42425b3
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 5, 2021
9c7a782
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
9ac8af8
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
bb329a3
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
2d5ea4e
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
4f9cb4e
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
035ea73
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
727887f
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
ab23b18
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
38a553d
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 6, 2021
b7634fc
Merge branch 'agent_DST2' into perfluence
oserikov Jun 6, 2021
46b604d
Merge remote-tracking branch 'origin/rulebased_gobot' into perfluence
oserikov Jun 6, 2021
e8999ea
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 7, 2021
98fcede
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 7, 2021
e321ae6
wip unify md_yaml_reader and intent_catcher_reader
oserikov Jun 7, 2021
b224361
wip intents from outside
oserikov Jun 8, 2021
8411a3b
wip slots from outside
oserikov Jun 8, 2021
527478d
wip slots from outside
oserikov Jun 8, 2021
1e93e00
wip slots from outside
oserikov Jun 8, 2021
79a154b
wip templated nlg from outside
oserikov Jun 8, 2021
27f3fa4
wip templated nlg from outside
oserikov Jun 8, 2021
50f4676
wip templated nlg from outside
oserikov Jun 8, 2021
2fddbe4
wip templated nlg from outside
oserikov Jun 8, 2021
b216e3a
wip templated nlg from outside
oserikov Jun 9, 2021
0d4cf04
wip templated nlg from outside
oserikov Jun 9, 2021
95573a6
wip templated nlg from outside
oserikov Jun 11, 2021
85f7297
wip templated nlg from outside
oserikov Jun 23, 2021
9a674f0
wip templated nlg from outside
oserikov Jun 23, 2021
dbcaf73
wip templated nlg from outside
oserikov Jun 26, 2021
5dbdf6a
Init Python files
Muennighoff Jul 5, 2021
0333287
Add TripPy logic
Muennighoff Jul 5, 2021
d9c2644
Generalize JSONNLGManager
Muennighoff Jul 5, 2021
160739f
Add TripPy to registry
Muennighoff Jul 5, 2021
1622a69
Fix naming
Muennighoff Jul 5, 2021
4225667
Remove experimental warmup
Muennighoff Jul 5, 2021
2220760
Remove previous_act_label
Muennighoff Jul 6, 2021
476907c
API Calls at interaction time
Muennighoff Jul 7, 2021
e7c6884
Update Levenshtein Calculation
Muennighoff Jul 9, 2021
c374f65
Add trippy architecture imgs
Muennighoff Jul 11, 2021
9aacb9e
Formatting
Muennighoff Jul 11, 2021
09f836f
Add trippy simple tutorial
Muennighoff Jul 13, 2021
e6f18b8
Add extended Trippy demo
Muennighoff Jul 16, 2021
33258fb
update demo
Muennighoff Jul 16, 2021
3c9df55
Fix image name
Muennighoff Jul 16, 2021
d6dbbbb
Enable Data Parallelism
Muennighoff Jul 17, 2021
3c087a8
Adapt for Mutli-GPU Data Parallelism
Muennighoff Jul 17, 2021
cf7c017
Clarify data parallelism setup
Muennighoff Jul 19, 2021
e64ffd2
Make Multi-GPU working
Muennighoff Jul 21, 2021
2c8f384
Add TripPy+RASA tutorial
Muennighoff Jul 29, 2021
142f6f7
Enable custom APIs
Muennighoff Aug 1, 2021
43e29f2
Add advanced GMaps Example
Muennighoff Aug 4, 2021
39d0446
Revert accidental changes
Muennighoff Aug 4, 2021
54922f9
Add newline
Muennighoff Aug 4, 2021
8be6585
Add newline
Muennighoff Aug 4, 2021
b5ecf27
Update trippy_extended_tutorial.ipynb
Muennighoff Aug 7, 2021
1b9e140
Include API links
Muennighoff Aug 7, 2021
7249984
Log instead of print
Muennighoff Aug 7, 2021
551751b
Explain Roberta
Muennighoff Aug 7, 2021
2d066a5
Clarify dummy
Muennighoff Aug 7, 2021
be2fd8c
Remove unused logging
Muennighoff Aug 7, 2021
a937201
Clarify labelling
Muennighoff Aug 7, 2021
6b05a02
Move loss functions to init
Muennighoff Aug 7, 2021
aafe0e9
Move action loss func
Muennighoff Aug 7, 2021
dbbcf53
Rename variable
Muennighoff Aug 7, 2021
7432ea6
Externalize DST
Muennighoff Aug 8, 2021
fadfe5a
Remove unicode conversion for Py2
Muennighoff Aug 9, 2021
bdfd890
Fix variable name
Muennighoff Aug 12, 2021
1e2ad41
Temporarily mute test
Muennighoff Aug 16, 2021
49984f5
Add requirements
Muennighoff Aug 16, 2021
72e17d0
TFHUB requirement
Muennighoff Aug 17, 2021
f72397a
Merge pull request #1464 from Muennighoff/rulebased_gobot_trippy
oserikov Aug 18, 2021
8b0ac31
Revert "Add Trippy to DeepPavlov"
oserikov Aug 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions deeppavlov/core/common/registry.json
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
"hybrid_ner_model": "deeppavlov.models.ner.NER_model:HybridNerModel",
"imdb_reader": "deeppavlov.dataset_readers.imdb_reader:ImdbReader",
"input_splitter": "deeppavlov.models.multitask_bert.multitask_bert:InputSplitter",
"intent_catcher_iterator": "deeppavlov.dataset_iterators.intent_catcher_iterator:IntentCatcherIterator",
"insurance_reader": "deeppavlov.dataset_readers.insurance_reader:InsuranceReader",
"jieba_tokenizer": "deeppavlov.models.tokenizers.jieba_tokenizer:JiebaTokenizer",
"joint_tagger_parser": "deeppavlov.models.syntax_parser.joint:JointTaggerParser",
Expand Down Expand Up @@ -207,5 +208,10 @@
"wiki_sqlite_vocab": "deeppavlov.vocabs.wiki_sqlite:WikiSQLiteVocab",
"wikitionary_100K_vocab": "deeppavlov.vocabs.typos:Wiki100KDictionary",
"intent_catcher_reader": "deeppavlov.dataset_readers.intent_catcher_reader:IntentCatcherReader",
"intent_catcher": "deeppavlov.models.intent_catcher.intent_catcher:IntentCatcher"
}
"intent_catcher": "deeppavlov.models.intent_catcher.intent_catcher:IntentCatcher",
"mem_classification_model": "deeppavlov.models.classifiers.memorizing_classifier:MemClassificationModel",
"md_yaml_dialogs_iterator": "deeppavlov.dataset_iterators.md_yaml_dialogs_iterator:MD_YAML_DialogsDatasetIterator",
"md_yaml_dialogs_ner_iterator": "deeppavlov.dataset_iterators.md_yaml_dialogs_ner_iterator:MD_YAML_DialogsDatasetNERIterator",
"md_yaml_dialogs_intents_iterator": "deeppavlov.dataset_iterators.md_yaml_dialogs_ner_iterator:MD_YAML_DialogsDatasetIntentsIterator",
"slotfill_raw_memorizing": "deeppavlov.models.slotfill.slotfill_raw:RASA_MemorizingSlotFillingComponent"
}
124 changes: 124 additions & 0 deletions deeppavlov/dataset_iterators/intent_catcher_iterator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Copyright 2017 Neural Networks and Deep Learning lab, MIPT
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import itertools
import re
from logging import getLogger
from typing import Tuple, List, Dict, Any, Iterator

from xeger import Xeger

from deeppavlov.core.common.registry import register
from deeppavlov.core.data.data_learning_iterator import DataLearningIterator
from deeppavlov.dataset_readers.dto.rasa.nlu import Intents, IntentDesc

log = getLogger(__name__)


@register('intent_catcher_iterator')
class IntentCatcherIterator(DataLearningIterator):
"""
Iterates over data for Intent Catcher training.
A subclass of :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`.

Args:
seed: random seed for data shuffling
shuffle: whether to shuffle data during batching
limit: Maximum number of phrases, that are generated from input regexps.

"""

def __init__(self,
data: Dict[str, List[Tuple[Any, Any]]],
seed: int = None,
shuffle: bool = True,
limit: int = 10) -> None:
self.limit = limit
super().__init__(data, seed, shuffle)

def gen_batches(self,
batch_size: int,
data_type: str = 'train',
shuffle: bool = None) -> Iterator[Tuple]:
"""Generate batches of inputs and expected output to train
Intents Catcher

Args:
batch_size: number of samples in batch
data_type: can be either 'train', 'test', or 'valid'
shuffle: whether to shuffle dataset before batching

Returns:
regexps used in the passed data_type, list of sentences generated
from the original regexps, list of generated senteces' labels
"""

if shuffle is None:
shuffle = self.shuffle

ic_file_content: Intents = self.data[data_type]["nlu_lines"]
sentences, labels = [], []
for intent in ic_file_content.intents:
for intent_line in intent.lines:
sentences.append(intent_line.text)
labels.append(intent.title)

assert len(sentences) == len(labels), \
"Number of labels is not equal to the number of sentences"

try:
regexps = [re.compile(s) for s in sentences]
except Exception as e:
log.error(f"Some sentences are not a consitent regular expressions")
raise e

proto_entries_indices = list(range(len(sentences)))
if shuffle:
self.random.shuffle(proto_entries_indices)

if batch_size < 0:
batch_size = len(proto_entries_indices)

xeger = Xeger(self.limit)

regexps, generated_sentences, generated_labels = [], [], []
generated_cnt = 0
for proto_entry_ix in proto_entries_indices:
sent, lab = sentences[proto_entry_ix], labels[proto_entry_ix]
regex_ = re.compile(sent)

gx = {xeger.xeger(sent) for _ in range(self.limit)}
generated_sentences.extend(gx)
generated_labels.extend([lab for _ in range(len(gx))])
regexps.extend([regex_ for _ in range(len(gx))])

if len(generated_sentences) == batch_size:
# tuple(zip) below does [r1, r2, ..], [s1, s2, ..] -> ((r1, s1), (r2, s2), ..)
yield tuple(zip(regexps, generated_sentences)), generated_labels
generated_cnt += len(generated_sentences)
regexps, generated_sentences, generated_labels = [], [], []

if generated_sentences:
yield tuple(zip(regexps, generated_sentences)), generated_labels
generated_cnt += len(generated_sentences)
regexps, generated_sentences, generated_labels = [], [], []

log.info(f"Original number of samples: {len(sentences)}"
f", generated samples: {generated_cnt}")

def get_instances(self, data_type: str = 'train') -> Tuple[tuple, tuple]:
res = tuple(map(lambda it: tuple(itertools.chain(*it)),
zip(*self.gen_batches(batch_size=-1,
data_type=data_type,
shuffle=False))))
return res
Loading