{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":748839863,"defaultBranch":"main","name":"transformers","ownerLogin":"khipp","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2024-01-26T21:15:15.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/9824526?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1710178008.0","currentOid":""},"activityList":{"items":[{"before":"47b096412da9cbeb9351806e9f0eb70a693b2859","after":"cea9ec086a14da1320940c5e48d7bd5dbcf32734","ref":"refs/heads/main","pushedAt":"2024-09-12T08:33:16.000Z","pushType":"push","commitsCount":44,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[docs] add the missing tokenizer when pushing models to huggingface hub (#33428)\n\n* add tokenizer\r\n\r\n* typo","shortMessageHtmlLink":"[docs] add the missing tokenizer when pushing models to huggingface h…"}},{"before":"f1a385b1de7e83e2be9b087d1c0646c0c426e2fc","after":"47b096412da9cbeb9351806e9f0eb70a693b2859","ref":"refs/heads/main","pushedAt":"2024-09-05T10:36:21.000Z","pushType":"push","commitsCount":60,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195)\n\n* fix FM training kernels\r\n\r\n* fix copies\r\n\r\n* fix copies\r\n\r\n* propagate to slow path\r\n\r\n* make it BC\r\n\r\n* add comment\r\n\r\n* fix test","shortMessageHtmlLink":"Fix: Fix FalconMamba training issues due to incompatible kernels (h…"}},{"before":"0a7af19f4dc868bafc82f35eb7e8d13bac87a594","after":"f1a385b1de7e83e2be9b087d1c0646c0c426e2fc","ref":"refs/heads/main","pushedAt":"2024-08-28T09:41:26.000Z","pushType":"push","commitsCount":30,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[RoBERTa-based] Add support for sdpa (#30510)\n\n* Adding SDPA support for RoBERTa-based models\r\n\r\n* add not is_cross_attention\r\n\r\n* fix copies\r\n\r\n* fix test\r\n\r\n* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin\r\n\r\n* address some review comments\r\n\r\n* use copied from\r\n\r\n* style\r\n\r\n* consistency\r\n\r\n* fix lists\r\n\r\n---------\r\n\r\nCo-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>","shortMessageHtmlLink":"[RoBERTa-based] Add support for sdpa (huggingface#30510)"}},{"before":"52cb4034ada381fe1ffe8d428a1076e5411a8026","after":"0a7af19f4dc868bafc82f35eb7e8d13bac87a594","ref":"refs/heads/main","pushedAt":"2024-08-24T09:17:28.000Z","pushType":"push","commitsCount":47,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Update Jinja docs with new functions and general cleanup (#33097)","shortMessageHtmlLink":"Update Jinja docs with new functions and general cleanup (huggingface…"}},{"before":"d6751d91c8f58cdeb35af6adae182d7dc90aa883","after":"52cb4034ada381fe1ffe8d428a1076e5411a8026","ref":"refs/heads/main","pushedAt":"2024-08-18T14:16:33.000Z","pushType":"push","commitsCount":13,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"generate: missing `to` in DoLa body, causing exceptions in multi-gpu generation (#32856)","shortMessageHtmlLink":"generate: missing to in DoLa body, causing exceptions in multi-gpu …"}},{"before":"48101cf8d127bbf22d751c7df118a6ce357e2e27","after":"d6751d91c8f58cdeb35af6adae182d7dc90aa883","ref":"refs/heads/main","pushedAt":"2024-08-15T22:03:42.000Z","pushType":"push","commitsCount":36,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"fix: update doc link for runhouse in README.md (#32664)","shortMessageHtmlLink":"fix: update doc link for runhouse in README.md (huggingface#32664)"}},{"before":"621fb3c0edddf98f3272f3b197e772af4fa30b6c","after":"48101cf8d127bbf22d751c7df118a6ce357e2e27","ref":"refs/heads/main","pushedAt":"2024-08-11T21:18:01.000Z","pushType":"push","commitsCount":63,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"🌐 [i18n-KO] Translated `agent.md` to Korean (#32351)\n\n* docs: ko: main_classes/agent\r\n\r\n* feat: chatgpt draft\r\n\r\n* fix: manual edits\r\n\r\n* \bfix: resolve suggestions\r\n\r\nCo-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>\r\nCo-authored-by: thsamaji <60818655+thsamajiki@users.noreply.github.com>\r\nCo-authored-by: SeungAhSon \r\n\r\n* fix: resolve suggestions\r\n\r\n* fix: resolve code line number\r\n\r\n---------\r\n\r\nCo-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>\r\nCo-authored-by: thsamaji <60818655+thsamajiki@users.noreply.github.com>\r\nCo-authored-by: SeungAhSon ","shortMessageHtmlLink":"🌐 [i18n-KO] Translated agent.md to Korean (huggingface#32351)"}},{"before":"c1aa0edb48217f416f4bbe6e3a9db1500284513b","after":"621fb3c0edddf98f3272f3b197e772af4fa30b6c","ref":"refs/heads/main","pushedAt":"2024-08-03T19:01:56.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"MixtralFlashAttention2: put \"plus 1\" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)\n\n* Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)\r\n\r\n* fix typo [:-1] to [:, -1]\r\n\r\n* to meet formatting requirement\r\n\r\n* to meet formatting requirement\r\n\r\n* remove white space\r\n\r\n* MixtralFlashAttention2: put \"+ 1\" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.\r\n\r\n* propagate to startcoder2, phi3, mixtral and qwen2\r\n\r\n* update qwen2_moe","shortMessageHtmlLink":"MixtralFlashAttention2: put \"plus 1\" inside parentheses when calculat…"}},{"before":"453e74884fb7e2613e7b45033fbb3c1cadb638b4","after":"c1aa0edb48217f416f4bbe6e3a9db1500284513b","ref":"refs/heads/main","pushedAt":"2024-08-02T16:19:05.000Z","pushType":"push","commitsCount":15,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[generate] only require an attention mask for mps with torch<2.4 (#32367)\n\n* up\r\n\r\n* style\r\n\r\n* stopping","shortMessageHtmlLink":"[generate] only require an attention mask for mps with torch<2.4 (hug…"}},{"before":"f0bc49e7f61f74f055c47ad40e6010f57eed0b0b","after":"453e74884fb7e2613e7b45033fbb3c1cadb638b4","ref":"refs/heads/main","pushedAt":"2024-08-01T07:14:48.000Z","pushType":"push","commitsCount":22,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"LLaVa: add cache class attribute (#32278)\n\ncache class flag","shortMessageHtmlLink":"LLaVa: add cache class attribute (huggingface#32278)"}},{"before":"7f5d644e69068825bb5b6e84cdc56b3d3a9bd04f","after":"f0bc49e7f61f74f055c47ad40e6010f57eed0b0b","ref":"refs/heads/main","pushedAt":"2024-07-30T01:29:14.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"use torch 2.4 in 2 CI jobs (#32302)\n\nCo-authored-by: ydshieh ","shortMessageHtmlLink":"use torch 2.4 in 2 CI jobs (huggingface#32302)"}},{"before":"c85510f958e6955d88ea1bafb4f320074bfbd0c1","after":"7f5d644e69068825bb5b6e84cdc56b3d3a9bd04f","ref":"refs/heads/main","pushedAt":"2024-07-29T17:00:43.000Z","pushType":"push","commitsCount":41,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[pipeline] fix padding for 1-d tensors (#31776)\n\n* [pipeline] fix padding for 1-d tensors\r\n\r\n* add test\r\n\r\n* make style\r\n\r\n* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py\r\n\r\nCo-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>\r\n\r\n* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py\r\n\r\n---------\r\n\r\nCo-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>","shortMessageHtmlLink":"[pipeline] fix padding for 1-d tensors (huggingface#31776)"}},{"before":"8f0d26c55e5350be3aefdc94a106a24b147204bc","after":"c85510f958e6955d88ea1bafb4f320074bfbd0c1","ref":"refs/heads/main","pushedAt":"2024-07-23T19:07:53.000Z","pushType":"push","commitsCount":53,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[docs] change temperature to a positive value (#32077)\n\nfix","shortMessageHtmlLink":"[docs] change temperature to a positive value (huggingface#32077)"}},{"before":"a5c642fe7a1f25d3bdcd76991443ba6ff7ee34b2","after":"8f0d26c55e5350be3aefdc94a106a24b147204bc","ref":"refs/heads/main","pushedAt":"2024-07-18T19:31:20.000Z","pushType":"push","commitsCount":35,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"fix: Removed `duplicate entries` in a dictionary (#32041)\n\nRemoved duplicate key in a dictionary.","shortMessageHtmlLink":"fix: Removed duplicate entries in a dictionary (huggingface#32041)"}},{"before":"aec1ca3a588bc6c65f7886e3d3eaa74901a6356f","after":"a5c642fe7a1f25d3bdcd76991443ba6ff7ee34b2","ref":"refs/heads/main","pushedAt":"2024-07-15T08:59:11.000Z","pushType":"push","commitsCount":6,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Whisper: move to tensor cpu before converting to np array at decode time (#31954)","shortMessageHtmlLink":"Whisper: move to tensor cpu before converting to np array at decode t…"}},{"before":"e314395277d784a34ee99526f48155d4d62cff3d","after":"aec1ca3a588bc6c65f7886e3d3eaa74901a6356f","ref":"refs/heads/main","pushedAt":"2024-07-11T21:40:50.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[Bug Fix] fix qa pipeline tensor to numpy (#31585)\n\n* fix qa pipeline\r\n\r\n* fix tensor to numpy","shortMessageHtmlLink":"[Bug Fix] fix qa pipeline tensor to numpy (huggingface#31585)"}},{"before":"14d3b3f0f0cc8ba174d6d86f61b577bd31d0a99c","after":"e314395277d784a34ee99526f48155d4d62cff3d","ref":"refs/heads/main","pushedAt":"2024-07-11T13:28:38.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Refactor flash attention implementation in transformers (#31446)\n\n* dumb commit\r\n\r\n* nit\r\n\r\n* update\r\n\r\n* something like this\r\n\r\n* unpack in modeling utils\r\n\r\n* safe import\r\n\r\n* oups\r\n\r\n* update\r\n\r\n* nits\r\n\r\n* diff convert gemma\r\n\r\n* update\r\n\r\n* start propagating\r\n\r\n* udpate other modeling code as well\r\n\r\n* update for sliding window models\r\n\r\n* nits\r\n\r\n* more init cleanups\r\n\r\n* styling\r\n\r\n* fixup\r\n\r\n* noice\r\n\r\n* pass fixup\r\n\r\n* typo typing_extension -> typing_extensions\r\n\r\n* torch.nn.functionnal -> torch.nn.functional\r\n\r\n* add to import structure\r\n\r\n* unpack\r\n\r\n* simplify a bit more for this first version\r\n\r\n* nut\r\n\r\n* update\r\n\r\n* update\r\n\r\n* nit\r\n\r\n* ease the import of `Unpack`\r\n\r\n* remove useless `use_sliding_window`\r\n\r\n* no qua please\r\n\r\n* protect import?\r\n\r\n* style\r\n\r\n* [run-slow]\r\n\r\n* [run slow] llama,gemma,mistral,mixtral\r\n\r\n* remove extra kwargs\r\n\r\n* fix llama\r\n\r\n* address review comments\r\n\r\n* apply diff_model_converter to modeling_gemma.py\r\n\r\n* remove cache_position 1\r\n\r\n* remove cache_position 2\r\n\r\n* some cleaning\r\n\r\n* refactor gemma2 as well\r\n\r\n* apply review comments\r\n\r\n* rename file to modeling_flash_attention_utils.py\r\n\r\n* siglip refactor\r\n\r\n* remove dead code\r\n\r\n* is the hub down?\r\n\r\n* still down?\r\n\r\n* fix siglip\r\n\r\n* fix gemma2\r\n\r\n* fatal: Could not read from remote repository.\r\n\r\n* fix typo in softcap implem\r\n\r\n* flacky\r\n\r\n* Failed: Timeout >120.0s\r\n\r\n---------\r\n\r\nCo-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>","shortMessageHtmlLink":"Refactor flash attention implementation in transformers (huggingface#…"}},{"before":"1082361a1978d30db5c3932d1ee08914d74d9697","after":"14d3b3f0f0cc8ba174d6d86f61b577bd31d0a99c","ref":"refs/heads/main","pushedAt":"2024-07-11T09:35:54.000Z","pushType":"push","commitsCount":38,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Processor accepts any kwargs (#31889)\n\n* accept kwargs in processors\r\n\r\n* return unused kwargs\r\n\r\n* fix tests\r\n\r\n* typo\r\n\r\n* update the other way","shortMessageHtmlLink":"Processor accepts any kwargs (huggingface#31889)"}},{"before":"3345ae733b6f4aeb7204a0f3e646a3cdbaad0023","after":"1082361a1978d30db5c3932d1ee08914d74d9697","ref":"refs/heads/main","pushedAt":"2024-07-07T18:27:46.000Z","pushType":"push","commitsCount":32,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Depth Anything: update conversion script for V2 (#31522)\n\n* Depth Anything: update conversion script for V2\r\n\r\n* Update docs\r\n\r\n* Style\r\n\r\n* Revert \"Update docs\"\r\n\r\nThis reverts commit be0ca47ea1be4f3cd9aa2113bdd8efcc9959119e.\r\n\r\n* Add docs for depth anything v2\r\n\r\n* Add depth_anything_v2 to MODEL_NAMES_MAPPING\r\n\r\nDone similarly to Flan-T5: https://github.com/huggingface/transformers/pull/19892/files\r\n\r\n* Add tip in original docs","shortMessageHtmlLink":"Depth Anything: update conversion script for V2 (huggingface#31522)"}},{"before":"e65502951593a76844e872fee9c56b805598538a","after":"3345ae733b6f4aeb7204a0f3e646a3cdbaad0023","ref":"refs/heads/main","pushedAt":"2024-07-02T07:51:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"dependencies: `keras-nlp<0.14` pin (#31684)\n\n* keras nlp pin\r\n\r\n* this should use the new docker images:dev\r\n\r\n* dev-ci","shortMessageHtmlLink":"dependencies: keras-nlp<0.14 pin (huggingface#31684)"}},{"before":"1de7dc7403b3b89ec421d43a8c9ee245211a61f6","after":"e65502951593a76844e872fee9c56b805598538a","ref":"refs/heads/main","pushedAt":"2024-06-29T13:23:54.000Z","pushType":"push","commitsCount":18,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Add French version of run scripts tutorial (#31483)\n\n* Add French translation of run scripts tutorial\r\n\r\n* Update docs/source/fr/run_scripts_fr.md\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update docs/source/fr/run_scripts_fr.md\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update docs/source/fr/run_scripts_fr.md\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update docs/source/fr/run_scripts_fr.md\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update docs/source/fr/run_scripts_fr.md\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n---------\r\n\r\nCo-authored-by: Jade Choghari \r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>","shortMessageHtmlLink":"Add French version of run scripts tutorial (huggingface#31483)"}},{"before":"74a207404e8d4524d1fdc4aa23789694f9eef347","after":"1de7dc7403b3b89ec421d43a8c9ee245211a61f6","ref":"refs/heads/main","pushedAt":"2024-06-26T21:02:59.000Z","pushType":"push","commitsCount":29,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Skip tests properly (#31308)\n\n* Skip tests properly\r\n\r\n* [test_all]\r\n\r\n* Add 'reason' as kwarg for skipTest\r\n\r\n* [test_all] Fix up\r\n\r\n* [test_all]","shortMessageHtmlLink":"Skip tests properly (huggingface#31308)"}},{"before":"730a440734e1fb47c903c17e3231dac18e3e5fd6","after":"74a207404e8d4524d1fdc4aa23789694f9eef347","ref":"refs/heads/main","pushedAt":"2024-06-23T11:05:12.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"New model support RTDETR (#29077)\n\n* fill out docs string in configuration\r\nhttps://github.com/huggingface/transformers/pull/29077/files/75dcd3a0e82cca36f12178b65bbd071ab7b25088#r1506391856\r\n\r\n* reduce the input image size for the tests\r\n\r\n* remove the unappropriate tests\r\n\r\n* only 5 failes exists\r\n\r\n* make style\r\n\r\n* fill up missed architecture for object detection in docs\r\n\r\n* fix auto modeling\r\n\r\n* simple fix in missing import\r\n\r\n* major change including backbone refactor and objectdetectionoutput refactor\r\n\r\n* minor fix only 4 fails left\r\n\r\n* intermediate fix\r\n\r\n* revert __init__.py\r\n\r\n* revert __init__.py\r\n\r\n* make style\r\n\r\n* fixes in pr_docs\r\n\r\n* intermediate fix\r\n\r\n* make style\r\n\r\n* two fixes\r\n\r\n* pass doctest\r\n\r\n* only one fix left\r\n\r\n* intermediate commit\r\n\r\n* all fixed\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_pytorch.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update tests/models/rt_detr/test_modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* function class above the model definition in dice_loss\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* simple fix\r\n\r\n* layernorm add config.layer_norm_eps\r\n\r\n* fix inputs_docstring\r\n\r\n* make style\r\n\r\n* simple fix\r\n\r\n* add custom coco loading test in image_processor\r\n\r\n* fix error in BaseModelOutput\r\nhttps://github.com/huggingface/transformers/pull/29077#discussion_r1516657790\r\n\r\n* simple typo\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* intermediate fix\r\n\r\n* fix with load_backbone format\r\n\r\n* remove unused configuration\r\n\r\n* 3 fix test left\r\n\r\n* make style\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: Sounak Dey \r\n\r\n* change last_hidden_state to first index\r\n\r\n* all pass fix\r\nTO DO: minor update in comments\r\n\r\n* make fix-copies\r\n\r\n* remove deepcopy\r\n\r\n* pr_document fix\r\n\r\n* revert deepcopy due to the issue of unexpceted behavior in decoderlayer\r\n\r\n* add atol in final\r\n\r\n* add no_split_module\r\n\r\n* _no_split_modules = None\r\n\r\n* device transfer for model parallelism\r\n\r\n* minor fix\r\n\r\n* make fix-copies\r\n\r\n* fix typo\r\n\r\n* add test_image_processor with post_processing\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* add config in RTDETRPredictionHead\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* set lru_cache with max_size 32\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* add lru_cache import and configuration change\r\n\r\n* change the order of definition\r\n\r\n* make fix-copies\r\n\r\n* add docs and change config error\r\n\r\n* revert strange make-fix\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* test pass\r\n\r\n* fix get_clones related and remove deepcopy\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* nit for paper section\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* rename denoising related parameters\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* check the image transformation logic\r\n\r\n* make style\r\n\r\n* make style\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* pe_encoding -> positional_encoding_temperature\r\n\r\n* remove TODO\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* remove eval_idx since transformer DETR is giving all decoder output\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* change variable name\r\n\r\n* make style and docs import update\r\n\r\n* Revert \"Update src/transformers/models/rt_detr/image_processing_rt_detr.py\"\r\n\r\nThis reverts commit 74aa3e1de0ca0cd3d354161d38ef28b4389c0eee.\r\n\r\n* fix typo\r\n\r\n* add postprocessing in docs\r\n\r\n* move import scipy to top\r\n\r\n* change varaible name\r\n\r\n* make fix-copies\r\n\r\n* remove eval_idx in test\r\n\r\n* move to after first sentence\r\n\r\n* update image_processor since box loss requires normalized one\r\n\r\n* change appropriate name to auxiliary_outputs\r\n\r\n* Update src/transformers/models/rt_detr/__init__.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/__init__.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update docs/source/en/model_doc/rt_detr.md\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update docs/source/en/model_doc/rt_detr.md\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* make style\r\n\r\n* remove panoptic related comments\r\n\r\n* make style\r\n\r\n* revert valid_processor_keys\r\n\r\n* fix aux related test\r\n\r\n* make style\r\n\r\n* change origination from config to backbone API\r\n\r\n* enable the dn_loss\r\n\r\n* fix test and conversion\r\n\r\n* renewal weight initialization\r\n\r\n* change initializer_range\r\n\r\n* make fix-up\r\n\r\n* fix the loss issue in the auxiliary output and denoising part\r\n\r\n* change weight loss to original RTDETR\r\n\r\n* fix in initialization\r\n\r\n* sync shape format of dn and aux\r\n\r\n* make style\r\n\r\n* stable fine-tuning and compatible conversion for resnet101\r\n\r\n* make style\r\n\r\n* skip input_embed\r\n\r\n* change encoder related variable\r\n\r\n* enable converting rtdetr_r101\r\n\r\n* add r101 related conversion code\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update docs/source/en/model_doc/rt_detr.md\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/__init__.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/__init__.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/image_processing_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* change name _shape to _reshape\r\n\r\n* Update src/transformers/__init__.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* Update src/transformers/__init__.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* maket style\r\n\r\n* make fix-copies\r\n\r\n* remove deprecated import\r\n\r\n* more fix\r\n\r\n* remove last_hidden_state for task-specific model\r\n\r\n* Revert \"remove last_hidden_state for task-specific model\"\r\n\r\nThis reverts commit ccb7a34051d69b9fc7aa17ed8644664d3fdbdaca.\r\n\r\n* minore change in convert\r\n\r\n* remove print\r\n\r\n* make style and fix-copies\r\n\r\n* add custom rtdetr backbone for r18, r34\r\n\r\n* remove print\r\n\r\n* change copied\r\n\r\n* add pad_size\r\n\r\n* make style\r\n\r\n* change layertype to optional to pass the CI\r\n\r\n* make style\r\n\r\n* add test in modeling_resnet_rt_detr\r\n\r\n* make fix-copies\r\n\r\n* skip tmp file test\r\n\r\n* fix comment\r\n\r\n* add docs\r\n\r\n* change to modeling_resnet file format\r\n\r\n* enabling resnet50 above\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr.py\r\n\r\nCo-authored-by: Jason Wu \r\n\r\n* enable all the rtdetr model :)\r\n\r\n* finish except CI\r\n\r\n* add RTDetrResNetBackbone\r\n\r\n* make fix-copies\r\n\r\n* fix\r\nTO DO: CI enable\r\n\r\n* make style\r\n\r\n* rename test\r\n\r\n* add docs\r\n\r\n* add special fix\r\n\r\n* revert resnet\r\n\r\n* Update src/transformers/models/rt_detr/modeling_rt_detr_resnet.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* add more comment\r\n\r\n* remove swin comment\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* rename convert and add verify backbone\r\n\r\n* Update docs/source/en/_toctree.yml\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update docs/source/en/model_doc/rt_detr.md\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* Update docs/source/en/model_doc/rt_detr.md\r\n\r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\n\r\n* make style\r\n\r\n* requests for docs\r\n\r\n* more general test docs\r\n\r\n* general script docs\r\n\r\n* make fix-copies\r\n\r\n* final commit\r\n\r\n* Revert \"Update src/transformers/models/rt_detr/configuration_rt_detr.py\"\r\n\r\nThis reverts commit d136225cd3f64f510d303ce1d227698174f43fff.\r\n\r\n* skip test_model_get_set_embeddings\r\n\r\n* remove target\r\n\r\n* add changes\r\n\r\n* make fix-copies\r\n\r\n* remove decoder_attention_mask\r\n\r\n* add load_backbone function for auto_backbone\r\n\r\n* remove comment\r\n\r\n* fix repo name\r\n\r\n* Update src/transformers/models/rt_detr/configuration_rt_detr.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n* final commit\r\n\r\n* remove unused downsample_in_bottleneck\r\n\r\n* new test for autobackbone\r\n\r\n* change to appropriate indices\r\n\r\n* test fix\r\n\r\n* fix dict in test_image_processor\r\n\r\n* fix test\r\n\r\n* [run-slow] rt_detr, rt_detr_resnet\r\n\r\n* change the slow test\r\n\r\n* [run-slow] rt_detr\r\n\r\n* [run-slow] rt_detr, rt_detr_resnet\r\n\r\n* make in to same cuda in CSPRepLayer\r\n\r\n* [run-slow] rt_detr, rt_detr_resnet\r\n\r\n---------\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\nCo-authored-by: Sounak Dey \r\nCo-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>\r\nCo-authored-by: Jason Wu \r\nCo-authored-by: ChoiSangBum ","shortMessageHtmlLink":"New model support RTDETR (huggingface#29077)"}},{"before":"ff689f57aa111261e6c2a506a42479d99674b123","after":"730a440734e1fb47c903c17e3231dac18e3e5fd6","ref":"refs/heads/main","pushedAt":"2024-06-21T07:47:19.000Z","pushType":"push","commitsCount":95,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Deprecate legacy cache + use cache position (#31491)\n\n* tmp\r\n\r\n* update models\r\n\r\n* revert utils\r\n\r\n* delete\r\n\r\n* Update src/transformers/models/dbrx/modeling_dbrx.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* modify warning msg\r\n\r\n---------\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>","shortMessageHtmlLink":"Deprecate legacy cache + use cache position (huggingface#31491)"}},{"before":"9d35edbb30625489bf286a9b15aed0c5a3119c1c","after":"ff689f57aa111261e6c2a506a42479d99674b123","ref":"refs/heads/main","pushedAt":"2024-06-07T13:17:18.000Z","pushType":"push","commitsCount":113,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Extend save_pretrained to offloaded models (#27412)\n\n* added hidden subset\r\n\r\n* debugged hidden subset contrastive search\r\n\r\n* added contrastive search compression\r\n\r\n* debugged compressed contrastive search\r\n\r\n* memory reduction for contrastive search\r\n\r\n* debugged mem red\r\n\r\n* added low memory option feature\r\n\r\n* debugged mem optmimization output stack\r\n\r\n* debugged mem optmimization output stack\r\n\r\n* debugged low mem\r\n\r\n* added low mem cache\r\n\r\n* fixed 2047 tensor view\r\n\r\n* debugged 2042 past key val inputs\r\n\r\n* reformatted tensors\r\n\r\n* changed low mem output\r\n\r\n* final clean\r\n\r\n* removed subset hidden csearch\r\n\r\n* fixed hidden device\r\n\r\n* fixed hidden device\r\n\r\n* changed compressor dtype\r\n\r\n* removed hstate compression\r\n\r\n* integrated csearch in generate\r\n\r\n* test csearch integration into generation\r\n\r\nexit()\r\n\r\n* fixed csearch kwarg integration with generation\r\n\r\n* final wrap and added doc\r\n\r\n* Update src/transformers/generation/utils.py\r\n\r\nCo-authored-by: Joao Gante \r\n\r\n* Update src/transformers/generation/utils.py\r\n\r\nCo-authored-by: Joao Gante \r\n\r\n* Update src/transformers/generation/utils.py\r\n\r\nCo-authored-by: Joao Gante \r\n\r\n* added debug print\r\n\r\n* direct hstate cat\r\n\r\n* direct hstate cat\r\n\r\n* direct hstate cat debug\r\n\r\n* direct hstate cat debug\r\n\r\n* expanded full hidden state stack\r\n\r\n* expanded full hidden state stack\r\n\r\n* matched dims for hstates\r\n\r\n* matched dims for hstates\r\n\r\n* logits fix\r\n\r\n* equality test\r\n\r\n* equality hidden debug\r\n\r\n* debug\r\n\r\n* added prints for debug\r\n\r\n* added prints for debug\r\n\r\n* equality check\r\n\r\n* switched squeeze dim\r\n\r\n* input format debug\r\n\r\n* tracing top_k_ids\r\n\r\n* removed trace\r\n\r\n* added test context\r\n\r\n* added jitter\r\n\r\n* added jitter\r\n\r\n* added jitter\r\n\r\n* returned state\r\n\r\n* rebuilt past key value reconstruction\r\n\r\n* debugged\r\n\r\n* cleaned traces\r\n\r\n* added selection for pkv\r\n\r\n* changed output to dict\r\n\r\n* cleaned\r\n\r\n* cleaned\r\n\r\n* cleaned up contrastive search test\r\n\r\n* moved low_memory kwarg\r\n\r\n* debugged\r\n\r\n* changed low mem test batch size to 1\r\n\r\n* removed output\r\n\r\n* debugged test input shape\r\n\r\n* reformatted csearch test\r\n\r\n* added trace\r\n\r\n* removed unsqueeze on final forward pass\r\n\r\n* replaced unsqueeze with view\r\n\r\n* removed traces\r\n\r\n* cleaned\r\n\r\n* debugged model kwargs\r\n\r\n* removed special models from test\r\n\r\n* ran make quality\r\n\r\n* Update src/transformers/generation/configuration_utils.py\r\n\r\nCo-authored-by: Joao Gante \r\n\r\n* Update src/transformers/generation/configuration_utils.py\r\n\r\nCo-authored-by: Joao Gante \r\n\r\n* refactored\r\n\r\n* refactored\r\n\r\n* refactored\r\n\r\n* make fixup\r\n\r\n* renamed flag sequential\r\n\r\n* renamed flag sequential\r\n\r\n* iterative onloading\r\n\r\n* black style and test utils\r\n\r\n* added traces for integrated test\r\n\r\n* debugged\r\n\r\n* added traces\r\n\r\n* make style\r\n\r\n* removed traces, make style\r\n\r\n* included suggestions and added test\r\n\r\n* debugged test\r\n\r\n* added offload module check and make style\r\n\r\n* is_accelerate_available and make style\r\n\r\n* added test decorator\r\n\r\n* changed test model and config spec\r\n\r\n* added offload condition\r\n\r\n* added lazy loading for each shard\r\n\r\n* debugged\r\n\r\n* modified sharding\r\n\r\n* debugged\r\n\r\n* added traces\r\n\r\n* removed safe serialization\r\n\r\n* no index overload;\r\n\r\n* trace on safe save ptrs\r\n\r\n* added ptr condition\r\n\r\n* debugged\r\n\r\n* debugged ptr\r\n\r\n* moved module map init\r\n\r\n* remake shard only for offloaded modules\r\n\r\n* refactored\r\n\r\n* debugged\r\n\r\n* refactored\r\n\r\n* debugged\r\n\r\n* cleaned and make style\r\n\r\n* cleaned and make style\r\n\r\n* added trace\r\n\r\n* sparse module map\r\n\r\n* debugged\r\n\r\n* removed module map conditional\r\n\r\n* refactored\r\n\r\n* debug\r\n\r\n* debugged\r\n\r\n* added traces\r\n\r\n* added shard mem trace\r\n\r\n* added shard mem trace\r\n\r\n* removed underlying storage check\r\n\r\n* refactored\r\n\r\n* memory leak removal and make style\r\n\r\n* cleaned\r\n\r\n* swapped test decs and make style\r\n\r\n* added mem checks and make style\r\n\r\n* added free mem warning\r\n\r\n* implemented some suggestions\r\n\r\n* moved onloading to accelerate\r\n\r\n* refactored for accelerate integration\r\n\r\n* cleaned test\r\n\r\n* make style\r\n\r\n* debugged offload map name\r\n\r\n* cleaned and make style\r\n\r\n* replaced meta device check for sharding\r\n\r\n* cleaned and make style\r\n\r\n* implemented some suggestions\r\n\r\n* more suggestions\r\n\r\n* update warning\r\n\r\nCo-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>\r\n\r\n* more suggestions\r\n\r\n* make style\r\n\r\n* new make style\r\n\r\n* Update src/transformers/modeling_utils.py\r\n\r\nCo-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>\r\n\r\n* Update src/transformers/modeling_utils.py\r\n\r\nCo-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>\r\n\r\n* Update src/transformers/modeling_utils.py\r\n\r\nCo-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>\r\n\r\n* Update src/transformers/modeling_utils.py\r\n\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>\r\n\r\n---------\r\n\r\nCo-authored-by: Joao Gante \r\nCo-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>\r\nCo-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>","shortMessageHtmlLink":"Extend save_pretrained to offloaded models (huggingface#27412)"}},{"before":"bdb9106f247fca48a71eb384be25dbbd29b065a8","after":"9d35edbb30625489bf286a9b15aed0c5a3119c1c","ref":"refs/heads/main","pushedAt":"2024-05-27T19:20:51.000Z","pushType":"push","commitsCount":7,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"skip `test_model_parallelism` for 2 model test classes (#31067)\n\nskip\r\n\r\nCo-authored-by: ydshieh ","shortMessageHtmlLink":"skip test_model_parallelism for 2 model test classes (huggingface#3…"}},{"before":"ccdabc5642bf84849af93f591e207dc625c8e1e1","after":"bdb9106f247fca48a71eb384be25dbbd29b065a8","ref":"refs/heads/main","pushedAt":"2024-05-24T23:01:48.000Z","pushType":"push","commitsCount":109,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Paligemma- fix devices and dtype assignments (#31008)\n\n* fix devices and dtype assignments\r\n\r\n* [run-slow]paligemma","shortMessageHtmlLink":"Paligemma- fix devices and dtype assignments (huggingface#31008)"}},{"before":"47735f5f0f2752500d115d2f6bd57816032599b6","after":"ccdabc5642bf84849af93f591e207dc625c8e1e1","ref":"refs/heads/main","pushedAt":"2024-05-14T15:06:13.000Z","pushType":"push","commitsCount":30,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Add JetMoE model (#30005)\n\n* init jetmoe code\r\n\r\n* update archive maps\r\n\r\n* remove flax import\r\n\r\n* fix import error\r\n\r\n* update README\r\n\r\n* ruff fix\r\n\r\n* update readme\r\n\r\n* fix\r\n\r\n* update config\r\n\r\n* fix issue\r\n\r\n* merge files\r\n\r\n* fix model bug\r\n\r\n* fix test\r\n\r\n* auto fix\r\n\r\n* model size\r\n\r\n* add comments\r\n\r\n* fix form\r\n\r\n* add flash attention support\r\n\r\n* fix attention head number\r\n\r\n* fix init\r\n\r\n* fix support list\r\n\r\n* sort auto mapping\r\n\r\n* fix test\r\n\r\n* fix docs\r\n\r\n* update test\r\n\r\n* fix test\r\n\r\n* fix test\r\n\r\n* change variable name\r\n\r\n* fix config\r\n\r\n* fix init\r\n\r\n* update format\r\n\r\n* clean code\r\n\r\n* fix config\r\n\r\n* fix config\r\n\r\n* change default config\r\n\r\n* update config\r\n\r\n* fix issues\r\n\r\n* update formate\r\n\r\n* update config argument\r\n\r\n* update format\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* change to mixtral aux loss\r\n\r\n* change to cache_position\r\n\r\n* debug\r\n\r\n* fix bugs\r\n\r\n* debug\r\n\r\n* fix format\r\n\r\n* fix format\r\n\r\n* fix copy\r\n\r\n* fix format\r\n\r\n* fix format\r\n\r\n* fix sort\r\n\r\n* fix sort\r\n\r\n* fix sort\r\n\r\n* add copy comment\r\n\r\n* add copy from\r\n\r\n* remove debug code\r\n\r\n* revert readme update\r\n\r\n* add copy\r\n\r\n* debug\r\n\r\n* remove debug code\r\n\r\n* fix flash attention\r\n\r\n* add comments\r\n\r\n* clean code\r\n\r\n* clean format\r\n\r\n* fix format\r\n\r\n* fix format\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* change variable name\r\n\r\n* add copied from\r\n\r\n* fix variable name\r\n\r\n* remove deprecated functinos\r\n\r\n* sync to llama implementation\r\n\r\n* fix format\r\n\r\n* fix copy\r\n\r\n* fix format\r\n\r\n* update format\r\n\r\n* remove repr\r\n\r\n* add comment for moe weight\r\n\r\n* fix copy\r\n\r\n* Update src/transformers/models/jetmoe/configuration_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* add comments and reformat config\r\n\r\n* fix format\r\n\r\n* fix format\r\n\r\n* fix format\r\n\r\n* update test\r\n\r\n* update doc string in config\r\n\r\n* Update src/transformers/models/jetmoe/modeling_jetmoe.py\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\n\r\n* update config doc\r\n\r\n* update attention cache\r\n\r\n* fix format\r\n\r\n* fix copy\r\n\r\n---------\r\n\r\nCo-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>","shortMessageHtmlLink":"Add JetMoE model (huggingface#30005)"}},{"before":"4208c428f6a42e6f58ab44014e696bdf49def855","after":"47735f5f0f2752500d115d2f6bd57816032599b6","ref":"refs/heads/main","pushedAt":"2024-05-10T00:08:51.000Z","pushType":"push","commitsCount":21,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"[docs] Update es/pipeline_tutorial.md (#30684)\n\n* copy en/ contect to es/\r\n\r\n* translate first section\r\n\r\n* translate the doc\r\n\r\n* fix typos\r\n\r\n* run make style","shortMessageHtmlLink":"[docs] Update es/pipeline_tutorial.md (huggingface#30684)"}},{"before":"a0e77a1f6bdfcceccdc5618e8a01ee32ef47bfa8","after":"4208c428f6a42e6f58ab44014e696bdf49def855","ref":"refs/heads/main","pushedAt":"2024-05-07T13:11:24.000Z","pushType":"push","commitsCount":26,"pusher":{"login":"khipp","name":"Klaus Hipp","path":"/khipp","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9824526?s=80&v=4"},"commit":{"message":"Separate tokenizer tests (#30675)\n\n* nit\r\n\r\n* better filter\r\n\r\n* pipeline tests should only be models/xxx not anything else\r\n\r\n* nit to better see filtering of the files that are passed to test torch\r\n\r\n* oups","shortMessageHtmlLink":"Separate tokenizer tests (huggingface#30675)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEtB6fvAA","startCursor":null,"endCursor":null}},"title":"Activity · khipp/transformers"}