-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fromage annotator and dff-fromage-image-skill #523
Conversation
@ciwwwnd please,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
вроде всё, по любым вопросам пиши нам в тред или отвечай тут на комменты)
единственное -- я пока его не поднимала и не говорила, когда поправишь архитектурно и по коду, пингани меня, я еще раз по всему пройдусь, и уже тогда с ним пообщаюсь
@@ -0,0 +1,3 @@ | |||
SERVICE_PORT: 8069 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
у тебя еще MAX_HISTORY_DEPTH и MAX_RESPONSES_ABOUT_PICS
services/fromage/pipeline.yml
Outdated
- group: services | ||
connector: | ||
protocol: http | ||
timeout: 3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Нелль, мы вроде убрали component.yml файлы. извини, очень часто все меняется. посмотри пожалуйста свежий дев и сделай по аналогии ровно как в нем
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
чтоб лишнего не было а нужное было
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
и pipeline.yml вроде тоже. теперь внутри компонент только папке service_configs вместо этого
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это не твое
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
это не твое
WAIT_HOSTS: '' | ||
WAIT_HOSTS: spelling-preprocessing:8074, sentseg:8011, badlisted-words:8018, intent-catcher:8014, | ||
image-captioning:8123, fromage:8069, dff-program-y-skill:8008, dff-intent-responder-skill:8012, | ||
dff-fromage-image-skill:8070, dff-image-skill:8124, convers-evaluation-selector:8009 | ||
WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-480} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
таймаут
services/fromage/README.md
Outdated
@@ -0,0 +1,16 @@ | |||
# FROMAGe Service | |||
**FROMAGe** is a service that is used to get an image and respond accordingly to the user's questions. FROMAGe is based on grounding pretrained language models to the visual domain ([Grounding Language Models to Images for Multimodal Inputs and Outputs](https://arxiv.org/abs/2301.13823)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**FROMAGe** is a service that is used to process an input image and respond to the user's questions accordingly. It is based on the [FROMAGe](https://github.com/kohjingyu/fromage/tree/main) model from [Grounding Language Models to Images for Multimodal Inputs and Outputs](https://arxiv.org/abs/2301.13823).
services/fromage/README.md
Outdated
# FROMAGe Service | ||
**FROMAGe** is a service that is used to get an image and respond accordingly to the user's questions. FROMAGe is based on grounding pretrained language models to the visual domain ([Grounding Language Models to Images for Multimodal Inputs and Outputs](https://arxiv.org/abs/2301.13823)). | ||
|
||
GPU RAM 5 GB, RAM 5 GiB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GB not GiB
services/fromage/server.py
Outdated
logger = logging.getLogger(__name__) | ||
|
||
FILE_SERVER_URL = os.getenv("FILE_SERVER_URL") | ||
RET_SCALE_FACTOR = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я бы вынесла его во внешние аргументы. но это на подумать, хочешь ты всегда его нулем иметь, или иметь возможность задавать его прямо в аргах в docker-compose.override.yml. я за второе, потому что вы же планируете расширять фромаж, и возможно для других ситуаций нужен будет другой RET_SCALE_FACTOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- вернуть удаленные файлы про informal letter из skills/dff_template_prompted_skill
- в описании пр есть про "крашится в таких-то случаях", надо понимать, что каждый раз, как компонента крашится с ошибкой, она переподнимается! а это в случае больших нн моделей занимает прилично времени. Поэтому надо избежать всех возможных крашей с помощью try-except или правильных дефолтных значений
@@ -97,6 +97,8 @@ async def send(self, payload: Dict, callback: Callable): | |||
dialog_len = len(dialog["human_utterances"]) | |||
if user_uttr.get("attributes", {}).get("image") is not None: | |||
skills_for_uttr.append("dff_image_skill") | |||
skills_for_uttr.append("dff_fromage_image_skill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а если картинка была в предыдущей реплике и в текущую не попала, то и скилл не будет включаться? ты же вроде хотела, чтобы он мог до 5 реплик после присланной картинки обсуждать.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
вопрос остался актуальным
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be like
if any(["image" in user_uttr.get("attributes", {}) for user_uttr in dialog["human_utterances"][-5:]])
components/8iHHdjsnfhewkl.yml
Outdated
author: [email protected] | ||
description: The service is built using the FROMAGe model, which is able to produce meaningful conversations with users about different images. | ||
ram_usage: 5G | ||
gpu_usage: 18G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
в docker-compose.override.yml прописывается память не на гпу, а оперативка. ПОправить на корректные значения везде. Причем эти значения в docker-compose.override.yml - это прям лимит, то есть контейнер будет переподниматься, так как превзошел лимиты по оперативке. Так что назначай ее в виде максимального значения (например, при подгрузке модели на гпу скорее всего оперативы будет кушаться много)
...emplate_prompted_skill/service_configs/dff-informal-letter-ru-prompted-skill/environment.yml
Outdated
Show resolved
Hide resolved
"component": "components/xSwFvtAUdvtQosvzpb7oMg.yml", | ||
"service": "skill_selectors/rule_based_selector/service_configs/agent" | ||
"component": "components/dfsw4bji8bgjq2.yml", | ||
"service": "skill_selectors/description_based_skill_selector/service_configs/agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
почему используется description based skill selector, а добавляется он в rule based selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
так вроде щас description based skill selector как в деве дрима основного, или нет? я запуталась
вернуть rule_based? я вроде везде его выпилила и по поиску в папке вроде ничего на него не похоже
или он не туда встал?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я к тому, что у тебя должен использоваться тот же скилл селектор, в код которого ты добавила скилл)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
да,в ерни rule based skill selector
@@ -97,6 +97,8 @@ async def send(self, payload: Dict, callback: Callable): | |||
dialog_len = len(dialog["human_utterances"]) | |||
if user_uttr.get("attributes", {}).get("image") is not None: | |||
skills_for_uttr.append("dff_image_skill") | |||
skills_for_uttr.append("dff_fromage_image_skill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
вопрос остался актуальным
|
||
|
||
try: | ||
# test_server.run_test(handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
включи тесты обратно
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
и попробуй поднять локально, чтобы чекнуть, что поднимается скилл
@@ -179,6 +181,7 @@ async def send(self, payload: Dict, callback: Callable): | |||
skills_for_uttr.append("meta_script_skill") | |||
skills_for_uttr.append("dummy_skill") | |||
skills_for_uttr.append("dialogpt") # generative skill | |||
skills_for_uttr.append("dff_fromage_image_skill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ты хочешь во всех случаях включать скилл? странная логика. Надо делать так: включать 5 реплик после присланной картинки
@@ -97,6 +97,9 @@ async def send(self, payload: Dict, callback: Callable): | |||
dialog_len = len(dialog["human_utterances"]) | |||
if user_uttr.get("attributes", {}).get("image") is not None: | |||
skills_for_uttr.append("dff_image_skill") | |||
if dialog_len < 5: | |||
skills_for_uttr.append("dff_fromage_image_skill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ты включаешь скилл в итоге в 5 пяти репликах диалога, елси есть картинка. По-моему ты хочешь по-другому.
Тебе надо чекнуть, есть ли в последних пяти репликах от человека картинка. Если есть, то включить твой скилл.
@@ -97,6 +97,8 @@ async def send(self, payload: Dict, callback: Callable): | |||
dialog_len = len(dialog["human_utterances"]) | |||
if user_uttr.get("attributes", {}).get("image") is not None: | |||
skills_for_uttr.append("dff_image_skill") | |||
skills_for_uttr.append("dff_fromage_image_skill") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be like
if any(["image" in user_uttr.get("attributes", {}) for user_uttr in dialog["human_utterances"][-5:]])
@dilyararimovna @smilni PR moved to #562 . |
Все, что относится к fromage аннотатору:
/dp-formatters:
[None, 'https://bla-bla', None, None, 'https://bla-bla']
.Почему пустой строкой? Переходим к
server.py
аннотатора.Если поле текст — пустая строка, мы автоматически отправляем во фромаж последнюю пикчу и запрос ‘What is the image?’, чтобы он выдал релевантную информацию к картинке, а не просто последнее обсуждавшееся сообщение.
Можем обсуждать картинку примерно 5 сообщений (из-за устройства форматтера)
Что может работать не так:
Скилл:
Просто выдает аннотацию фромажа. Там есть невнятный переход (condition, который всегда возвращает True), который не несет смысловой нагрузки, но возможно там будут нормальные переходы когда-нибудь между нодами…
Запуск:
docker-compose -f docker-compose.yml -f assistant_dists/dream_multimodal/docker-compose.override.yml -f assistant_dists/dream_multimodal/dev.yml -f assistant_dists/dream_multimodal/proxy.yml up --build
docker-compose exec agent python -m deeppavlov_agent.run agent.channel=telegram agent.telegram_token=<INSERT_YOUR_TG_TOKEN_HERE> agent.pipeline_config=assistant_dists/dream_multimodal/pipeline_conf.json