All notable changes to this project will be documented in this file.
The format is inspired by (but not strictly follows) Keep a Changelog, and this project adheres to Semantic Versioning.
Before you create a Pull Request, remember to update the Changelog with your changes.
- Deprecate vanilla
DataType
- Remove
_Encodable
from project - Connect to Snowflake using the incluster oauth token
- Add postprocess in apibase model.
- Fallback for ibis drop table
- Add create events waiting on db apply.
- Streamlit component and server
- Graceful updates when making incremental changes
- Include templates remotely
- Remove duplicate jobs
- Add template information to regular component
- Add diff when re-applying component
- Add schema to
Template
- Low-code form builder for frontend
- Add snowflake vector search engine
- Add a meta-datatype
Vector
to handle different databackend requirements - Support
<var:template_staged_file>
in Template to enable apps to use files/folders included in the template - Add Data Component for storing data directly in the template
- Add a standalone flag in Streamlit to mark the page as independent.
- Catch exceptions in updates vs. breakign
- Fix the issue where the trigger fails in custom components.
- Fix serialization between vector-search client and vector-search backend with
to_dict
- Fix the bug in the update diff check that replaces uuids
- Fix snowflake vector search issues.
- Fix crontab drop method.
- Fix job status update.
- Fix a random bug caused by queue thread-safety issues when OSS used crontab.
- Fix the bug in the update mechanism that fails when the parent references an existing child.
0.4.0 (2024-Nov-02)
- Change images docker superduper/ to superduperio/
- Change the image's user from
/home/superduperdb
to/home/superduper
- Add message broker service config
- Add helper dict method in Event.
- Use declare_component from base class.
- Use different colors to distinguish logs
- Change all the
_outputs.
to_outputs__
- Disable cdc on output tables.
- Remove
-
from the uuid of the component. - Add _execute_select and filter in the Query class.
- Optimize the logic of ready_ids in trigger_ids.
- Move all plugins superduperdb/ext/* to /plugins
- Optimize the logic for file saving and retrieval in the artifact_store.
- Add backfill on load of vector index
- Add startup event to initialize db.apply jobs
- Update job_id after job submission
- Fixed default event.uuid
- Fixed atlas vector search
- Fix the bug where shared artifacts are deleted when removing a component.
- Fix compatibility issues with the latest version of pymongo.
- Fix the query parser incompatibility with '.' symbol.
- Fix the post like in the service vector_search.
- Fix the conflict of the same identifier during encoding.
- Fix the issue where MongoDB did not create the output table.
- Fix the bug in the CI where plugins are skipping tests.
- Updated CONTRIBUTING.md
- Add README.md files for the plugins.
- Add templates to project
- Add frontend to project
- Change compute init order in cluster initialize
- Add table error exception and sql table length fallback.
- Permissions of artifacts increased
- Make JSON-able a configuration depending on the databackend
- Restore some training test cases
- Simple querying shell
- Fix existing templates
- Add optional data insert to
Table
- Make Lance vector searcher as plugin.
- Remove job dependencies from job metadata
- Modify the field name output to _outputs.predict_id in the model results of Ibis.
- Remove the document_embed function.
- Support MongoDB outputs query
- Make "create a table" compulsory
- All datatypes should be wrapped with a Schema
- Support eager mode
- Add CSN env var
- Make tests configurable against backend
- Make the prefix of the output string configurable
- Add testing utils for plugins
- Add
cache
field in Component - Add predict_id in Listener
- Add serve in Model
- Added templates directory with OSS templates
- Qdrant vector search support
- Add placeholder for web app link in Application
- Add support for remote artifacts
- Add basic rest server
- Add
@trigger
decorator to improve developer experience ModelRouter
to enable easy toggles- Simple interactive shell
- Add pdf rag template
- Updated llm_finetuning template
- Add sql table length exceed limit and uuid truncation.
- Add ci workflow to test templates
- Add deploy flag in model.
- Modify the apply endpoint in the REST API to an asynchronous interface.
- Add db apply wait on create events.
- Vector-search vector-loading bug fixed
- Bugs related to new queuing paradigm
- Remove --user from make install_devkit as it supposed to run on a virtualenv.
- component info support list
- Trigger downstream vector indices.
- Fix vector_index function job.
- Fix verbosity in component info
- Change default encoding to sqlvector
- Fix some links in documentation
- Change
__dataclass_params__
to_dataclass_params
- Make component reload after caching in apply
- Fix a minor bug in schedule_jobs
- Fix vector index cleanup
- Fix the condition of the CDC job.
- Fix form_template
- Fix the duplicate children in the component.
- Fix datatype for graph models
- Fix bug in variables
- Fix Qdrant collection name
- Fix the ordering and sequencing of jobs initiated on
db.apply
- Fix rest routes with db injection and prefix.
- Fix cluster db setter.
- Fix decode function
0.3.0 (2024-Jun-21)
- Renamed superduper -> superduper
- Added data_pipeline_deps test case
- Add plugin component.
- QueryTemplate component
- Support for packaging application from the database.
- Added DataInit component
- Refactor ray jobs
- Fix templates
- Fix the issue of the filter in select not working in the listener.
- Fix exports
- Fix model query
- Fix doc-strings
- Fix support for keys in new queue handler.
- Fix the bug where the query itself changes after encoding
- Fix the dependency error in copy_vectors within vector_index.
- Fix Template substitutions
- Fix remove un_use_import function
- Fix some linting and small refactors.
0.2.0 (2024-Jun-21)
- Run Tests from within container
- Add model dict output indexing in graph
- Make lance upsert for added vectors
- Make vectors normalized in inmemory vector database for cosine measure
- Add local cluster as tmux session
- At the end of the test, drop the collection instead of the database
- Force load vector indices during backfill
- Fix pandas database (in-memory)
- Add and update docstrings in component classes and methods
- Changed the rest implementation to use new serialization
- Remove unused deadcode from the project
- Auto wrap insert documents as Document instances
- Changed the rest implementation to use the new serialization
- Mask special character keys in mongodb queries
- Fix listener cleanup after removal
- Don't require
@dc.dataclass
or@merge_docstrings
decorator - Make output of
Document.encode()
more minimalistic - Increment minimum supported ibis version to 9.0.0
- Make database connections reconnection on token expiry
- Prototype the cron job services
- Simplified Taskworkflow
- Add nightly image for pre-release testing in the cloud environment
- Fix torch model fit and make schedule_jobs at db add
- Add requires functionality for all extension modules
- CI fails if CHANGELOG.md is not updated on PRs
- Update Menu structure and renamed use-cases
- Change and simplify the contract for writing new
_Predictor
descendants (.predict_one
,.predict
) - Add file datatype type to support saving and reading files/folders in artifact_store
- Create models directly by importing package from auto and with decorator
@objectmodel
,@torchmodel
- Support Schema option for MongoDB
- Optimize LLM fine-tuning
- Sort out the llm directory structure
- Add cache support in inmemory vector searcher
- Add compute_kwargs option for model
- Add BulkWrite mongodb query
- Rename
_Predictor
toModel
- Allow developers to write
Listeners
andGraph
in a single formalism - Change unittesting framework to pure configuration (no patching configs)
- Add a simple REST server implementation
- Add reusable snippets that are reused across the docs
- Added snippet for connecting to superduper in docs
- Added support to serialize documents in a flat way "_leaves"
- Added
lazy_file
datatype - Show the DataLayer configuration
- Optimized LLM finetuning usage experience
- Auto-infer Schema from data
- Lazy-creation of output tables for ibis to enable auto-inference of output schema
- Add database packages that improve deployment and connection testing
- Enable dependency injection on image builders
- Add database package for oracle
- Reconstruct data serialization and database queries
- Auto-create tables and schemas
- Add
Application
andTemplate
support to build reusable apps - Add pretty-print to
Component.info
Model
- 'Add pluggable compute backend via config'
- Fixed cross platfrom issue in cli command
- Separate nightly release from sandbox
- Fixed a bug in refresh_after_insert for listeners with select None
- Refactor graph internal with input mapping
- Fixed a bug in Component init
- Fixed a bug in predict in db for missing ouptuts
- Fixed a bug in variable set
- Fixed the bug where select in listener is modified in schedule_jobs.
- LLM CI random errors
- VectorIndex schedule_jobs missing function
- Fixed some bugs of the cdc RAG application
- Fixed open source RAG Pipeline
- Fixed vllm real-time task concurrency bug
- Fixed Post-Like feature
- Added CORS Policy regarding REST server implementation
- Fixed some bugs in multimodal usecase
- Fixed File datatype
- Fixed a bug in artifact store to skip duplicate artifacts
- Fixed database permission issues when connecting to mongodb
- Handle ProgrammingError of SnowFlake for non-existing objects
- Updated the use cases.
- Update references to components and artifacts.
- Fix Ray compute async with job submission api.
- Refactor document encode
- Change '_leaves' to '_builds'
- Fixed empty identifier of Code.from_object.
- Fixed Native encodable.
- Fix ibis cdc and cdc config
- Fixed 'objectmodel' and 'predict_one' in doc.
- Fixed ray dependencies bug.
- Fixed listener dependencies bug.
- Fixed cluster bug.
0.1.3 (2024-Jun-20)
Test release before v0.2
0.1.1 (2024-Feb-09)
- Test suite takes config from external .env file
- Added support for multi key in model predict
- Support 3.10+ due to
dataclass
supported features - Updated the table creation method in MetaDataStore to improve compatibility across various databases
- Replaced JSON data with String format before storage in SQLAlchemy
- Implemented storage of byte data in base64 format
- Migrated MongoDB Atlas vector search as a standalone searcher like lance
- Deprecated Demo Image. Now Notebooks run in Colab
- Replace dask with ray compute backend
- All training and validation parameters to be configured in
_Predictor
attributes (.trainer
,.train_X
, etc.) - Docker build can include optional custom
requirements.txt
path
- Add Llama cpp model in extensions.
- Basic Ray server support to server models on ray cluster
- Add Graph mode support to chain models
- Simplify the testing of SQL databases using containerized databases
- Integrate Monitoring(cadvisor/Prometheus) and Logging (promtail/Loki) with Grafana, in the
testenv
- Add
QueryModel
andSequentialModel
to make chaining searches and models easier - Add
insert_to=<table-or-collection>
to.predict
to allow single predictions to be saved. - Support vLLM (running locally or remotely on a ray cluster)
- Support LLM service in OpenAI format
- Add lazy loading of artifacts by default
- Update connection uris in
sql_examples.ipynb
to include snippets for Embedded, Cloud, and Distributed databases - Fixed a bug related to using Clickhouse as both databackend and metastore
0.1.0 (2023-Dec-05)
- Introduced Chinese version of README
- Updated paths for docker-compose.
0.0.20 (2023-Dec-04)
- Chop down large files from the history to reduce the size of the repo
0.0.19 (2023-Dec-04)
- Add Changelog for tracking changes on the repo. It must be filled before any PR
- Remove ci-pinned-dependencies and replaced them with actions with better cache management
- Change logging mechanism from the default to loguru
- Update icons on the README.
- Reboot test-suite, with modular approach to toggling between SQL and MongoDB tests
- Add model-versioning of model-outputs
- Refactor OpenAI code to use the new features of the OpenAI API
- Fixes for dask worker compute delegation
- Wrap compute with abstraction as component of datalayer
- Simplify approach to project configuration
- Add services for vector-search and CDC for more comprehensive cluster mode
- Add a
Component.post_create
hook to enable logic to incorporate model versions - Fix multiple issues with
ibis
/ SQL code
- Add support for selecting whether logs will be redirected to the system output or directly to Loki
- Added libgl libraries in Dockerfile to correctly render the video in notebooks
0.0.15 (2023-Nov-01)
- Updated readme by @fnikolai in #1196.
- Removed unused import by @jieguangzhou in #1205.
- Updated README.md with contributors by @thejumpman2323 in #1201.
- Added conditional builders in Dockerfile by @fnikolai in #1213.
- Optimized unit tests by @jieguangzhou in #1204.
- Updated README.md with announcement emoji by @thejumpman2323 in #1222.
- Launched announcement by @fnikolai in #1208.
- Added raw SQL in ibis by @thejumpman2323 in #1220.
- Added experimental keyword by @fnikolai in #1218.
- Added query table by @thejumpman2323 in #1212.
- Merged Ashishpatel26 main by @blythed in #1224.
- Bumped Version to 0.0.15 by @fnikolai in #1225.
- Fixed dependencies and makefile by @fnikolai in #1209.
- Fixed demo release by @fnikolai in #1210.