Skip to content

Latest commit

 

History

History
403 lines (330 loc) · 16 KB

CHANGELOG.md

File metadata and controls

403 lines (330 loc) · 16 KB

superduper.io Changelog

All notable changes to this project will be documented in this file.

The format is inspired by (but not strictly follows) Keep a Changelog, and this project adheres to Semantic Versioning.

Before you create a Pull Request, remember to update the Changelog with your changes.

Changes Since Last Release

Changed defaults / behaviours

  • Deprecate vanilla DataType
  • Remove _Encodable from project
  • Connect to Snowflake using the incluster oauth token
  • Add postprocess in apibase model.
  • Fallback for ibis drop table
  • Add create events waiting on db apply.

New Features & Functionality

  • Streamlit component and server
  • Graceful updates when making incremental changes
  • Include templates remotely
  • Remove duplicate jobs
  • Add template information to regular component
  • Add diff when re-applying component
  • Add schema to Template
  • Low-code form builder for frontend
  • Add snowflake vector search engine
  • Add a meta-datatype Vector to handle different databackend requirements
  • Support <var:template_staged_file> in Template to enable apps to use files/folders included in the template
  • Add Data Component for storing data directly in the template
  • Add a standalone flag in Streamlit to mark the page as independent.

Bug Fixes

  • Catch exceptions in updates vs. breakign
  • Fix the issue where the trigger fails in custom components.
  • Fix serialization between vector-search client and vector-search backend with to_dict
  • Fix the bug in the update diff check that replaces uuids
  • Fix snowflake vector search issues.
  • Fix crontab drop method.
  • Fix job status update.
  • Fix a random bug caused by queue thread-safety issues when OSS used crontab.
  • Fix the bug in the update mechanism that fails when the parent references an existing child.

0.4.0 (2024-Nov-02)

Changed defaults / behaviours

  • Change images docker superduper/ to superduperio/
  • Change the image's user from /home/superduperdb to /home/superduper
  • Add message broker service config
  • Add helper dict method in Event.
  • Use declare_component from base class.
  • Use different colors to distinguish logs
  • Change all the _outputs. to _outputs__
  • Disable cdc on output tables.
  • Remove - from the uuid of the component.
  • Add _execute_select and filter in the Query class.
  • Optimize the logic of ready_ids in trigger_ids.
  • Move all plugins superduperdb/ext/* to /plugins
  • Optimize the logic for file saving and retrieval in the artifact_store.
  • Add backfill on load of vector index
  • Add startup event to initialize db.apply jobs
  • Update job_id after job submission
  • Fixed default event.uuid
  • Fixed atlas vector search
  • Fix the bug where shared artifacts are deleted when removing a component.
  • Fix compatibility issues with the latest version of pymongo.
  • Fix the query parser incompatibility with '.' symbol.
  • Fix the post like in the service vector_search.
  • Fix the conflict of the same identifier during encoding.
  • Fix the issue where MongoDB did not create the output table.
  • Fix the bug in the CI where plugins are skipping tests.
  • Updated CONTRIBUTING.md
  • Add README.md files for the plugins.
  • Add templates to project
  • Add frontend to project
  • Change compute init order in cluster initialize
  • Add table error exception and sql table length fallback.
  • Permissions of artifacts increased
  • Make JSON-able a configuration depending on the databackend
  • Restore some training test cases
  • Simple querying shell
  • Fix existing templates
  • Add optional data insert to Table
  • Make Lance vector searcher as plugin.
  • Remove job dependencies from job metadata

New Features & Functionality

  • Modify the field name output to _outputs.predict_id in the model results of Ibis.
  • Remove the document_embed function.
  • Support MongoDB outputs query
  • Make "create a table" compulsory
  • All datatypes should be wrapped with a Schema
  • Support eager mode
  • Add CSN env var
  • Make tests configurable against backend
  • Make the prefix of the output string configurable
  • Add testing utils for plugins
  • Add cache field in Component
  • Add predict_id in Listener
  • Add serve in Model
  • Added templates directory with OSS templates
  • Qdrant vector search support
  • Add placeholder for web app link in Application
  • Add support for remote artifacts
  • Add basic rest server
  • Add @trigger decorator to improve developer experience
  • ModelRouter to enable easy toggles
  • Simple interactive shell
  • Add pdf rag template
  • Updated llm_finetuning template
  • Add sql table length exceed limit and uuid truncation.
  • Add ci workflow to test templates
  • Add deploy flag in model.
  • Modify the apply endpoint in the REST API to an asynchronous interface.
  • Add db apply wait on create events.

Bug Fixes

  • Vector-search vector-loading bug fixed
  • Bugs related to new queuing paradigm
  • Remove --user from make install_devkit as it supposed to run on a virtualenv.
  • component info support list
  • Trigger downstream vector indices.
  • Fix vector_index function job.
  • Fix verbosity in component info
  • Change default encoding to sqlvector
  • Fix some links in documentation
  • Change __dataclass_params__ to _dataclass_params
  • Make component reload after caching in apply
  • Fix a minor bug in schedule_jobs
  • Fix vector index cleanup
  • Fix the condition of the CDC job.
  • Fix form_template
  • Fix the duplicate children in the component.
  • Fix datatype for graph models
  • Fix bug in variables
  • Fix Qdrant collection name
  • Fix the ordering and sequencing of jobs initiated on db.apply
  • Fix rest routes with db injection and prefix.
  • Fix cluster db setter.
  • Fix decode function

0.3.0 (2024-Jun-21)

Changed defaults / behaviours

  • Renamed superduper -> superduper
  • Added data_pipeline_deps test case

New Features & Functionality

  • Add plugin component.
  • QueryTemplate component
  • Support for packaging application from the database.
  • Added DataInit component
  • Refactor ray jobs

Bug Fixes

  • Fix templates
  • Fix the issue of the filter in select not working in the listener.
  • Fix exports
  • Fix model query
  • Fix doc-strings
  • Fix support for keys in new queue handler.
  • Fix the bug where the query itself changes after encoding
  • Fix the dependency error in copy_vectors within vector_index.
  • Fix Template substitutions
  • Fix remove un_use_import function
  • Fix some linting and small refactors.

0.2.0 (2024-Jun-21)

Changed defaults / behaviours

  • Run Tests from within container
  • Add model dict output indexing in graph
  • Make lance upsert for added vectors
  • Make vectors normalized in inmemory vector database for cosine measure
  • Add local cluster as tmux session
  • At the end of the test, drop the collection instead of the database
  • Force load vector indices during backfill
  • Fix pandas database (in-memory)
  • Add and update docstrings in component classes and methods
  • Changed the rest implementation to use new serialization
  • Remove unused deadcode from the project
  • Auto wrap insert documents as Document instances
  • Changed the rest implementation to use the new serialization
  • Mask special character keys in mongodb queries
  • Fix listener cleanup after removal
  • Don't require @dc.dataclass or @merge_docstrings decorator
  • Make output of Document.encode() more minimalistic
  • Increment minimum supported ibis version to 9.0.0
  • Make database connections reconnection on token expiry
  • Prototype the cron job services
  • Simplified Taskworkflow

New Features & Functionality

  • Add nightly image for pre-release testing in the cloud environment
  • Fix torch model fit and make schedule_jobs at db add
  • Add requires functionality for all extension modules
  • CI fails if CHANGELOG.md is not updated on PRs
  • Update Menu structure and renamed use-cases
  • Change and simplify the contract for writing new _Predictor descendants (.predict_one, .predict)
  • Add file datatype type to support saving and reading files/folders in artifact_store
  • Create models directly by importing package from auto and with decorator @objectmodel, @torchmodel
  • Support Schema option for MongoDB
  • Optimize LLM fine-tuning
  • Sort out the llm directory structure
  • Add cache support in inmemory vector searcher
  • Add compute_kwargs option for model
  • Add BulkWrite mongodb query
  • Rename _Predictor to Model
  • Allow developers to write Listeners and Graph in a single formalism
  • Change unittesting framework to pure configuration (no patching configs)
  • Add a simple REST server implementation
  • Add reusable snippets that are reused across the docs
  • Added snippet for connecting to superduper in docs
  • Added support to serialize documents in a flat way "_leaves"
  • Added lazy_file datatype
  • Show the DataLayer configuration
  • Optimized LLM finetuning usage experience
  • Auto-infer Schema from data
  • Lazy-creation of output tables for ibis to enable auto-inference of output schema
  • Add database packages that improve deployment and connection testing
  • Enable dependency injection on image builders
  • Add database package for oracle
  • Reconstruct data serialization and database queries
  • Auto-create tables and schemas
  • Add Application and Template support to build reusable apps
  • Add pretty-print to Component.info
  • Model
  • 'Add pluggable compute backend via config'

Bug Fixes

  • Fixed cross platfrom issue in cli command
  • Separate nightly release from sandbox
  • Fixed a bug in refresh_after_insert for listeners with select None
  • Refactor graph internal with input mapping
  • Fixed a bug in Component init
  • Fixed a bug in predict in db for missing ouptuts
  • Fixed a bug in variable set
  • Fixed the bug where select in listener is modified in schedule_jobs.
  • LLM CI random errors
  • VectorIndex schedule_jobs missing function
  • Fixed some bugs of the cdc RAG application
  • Fixed open source RAG Pipeline
  • Fixed vllm real-time task concurrency bug
  • Fixed Post-Like feature
  • Added CORS Policy regarding REST server implementation
  • Fixed some bugs in multimodal usecase
  • Fixed File datatype
  • Fixed a bug in artifact store to skip duplicate artifacts
  • Fixed database permission issues when connecting to mongodb
  • Handle ProgrammingError of SnowFlake for non-existing objects
  • Updated the use cases.
  • Update references to components and artifacts.
  • Fix Ray compute async with job submission api.
  • Refactor document encode
  • Change '_leaves' to '_builds'
  • Fixed empty identifier of Code.from_object.
  • Fixed Native encodable.
  • Fix ibis cdc and cdc config
  • Fixed 'objectmodel' and 'predict_one' in doc.
  • Fixed ray dependencies bug.
  • Fixed listener dependencies bug.
  • Fixed cluster bug.

0.1.3 (2024-Jun-20)

Test release before v0.2

0.1.1 (2024-Feb-09)

Changed defaults / behaviours

  • Test suite takes config from external .env file
  • Added support for multi key in model predict
  • Support 3.10+ due to dataclass supported features
  • Updated the table creation method in MetaDataStore to improve compatibility across various databases
  • Replaced JSON data with String format before storage in SQLAlchemy
  • Implemented storage of byte data in base64 format
  • Migrated MongoDB Atlas vector search as a standalone searcher like lance
  • Deprecated Demo Image. Now Notebooks run in Colab
  • Replace dask with ray compute backend
  • All training and validation parameters to be configured in _Predictor attributes (.trainer, .train_X, etc.)
  • Docker build can include optional custom requirements.txt path

New Features & Functionality

  • Add Llama cpp model in extensions.
  • Basic Ray server support to server models on ray cluster
  • Add Graph mode support to chain models
  • Simplify the testing of SQL databases using containerized databases
  • Integrate Monitoring(cadvisor/Prometheus) and Logging (promtail/Loki) with Grafana, in the testenv
  • Add QueryModel and SequentialModel to make chaining searches and models easier
  • Add insert_to=<table-or-collection> to .predict to allow single predictions to be saved.
  • Support vLLM (running locally or remotely on a ray cluster)
  • Support LLM service in OpenAI format
  • Add lazy loading of artifacts by default

Bug Fixes

  • Update connection uris in sql_examples.ipynb to include snippets for Embedded, Cloud, and Distributed databases
  • Fixed a bug related to using Clickhouse as both databackend and metastore

0.1.0 (2023-Dec-05)

New Features & Functionality

  • Introduced Chinese version of README

Bug Fixes

  • Updated paths for docker-compose.

0.0.20 (2023-Dec-04)

Changed defaults / behaviours

  • Chop down large files from the history to reduce the size of the repo

0.0.19 (2023-Dec-04)

Changed defaults / behaviours

  • Add Changelog for tracking changes on the repo. It must be filled before any PR
  • Remove ci-pinned-dependencies and replaced them with actions with better cache management
  • Change logging mechanism from the default to loguru
  • Update icons on the README.
  • Reboot test-suite, with modular approach to toggling between SQL and MongoDB tests
  • Add model-versioning of model-outputs
  • Refactor OpenAI code to use the new features of the OpenAI API
  • Fixes for dask worker compute delegation
  • Wrap compute with abstraction as component of datalayer
  • Simplify approach to project configuration
  • Add services for vector-search and CDC for more comprehensive cluster mode
  • Add a Component.post_create hook to enable logic to incorporate model versions
  • Fix multiple issues with ibis/ SQL code

New Features & Functionality

  • Add support for selecting whether logs will be redirected to the system output or directly to Loki

Bug Fixes

  • Added libgl libraries in Dockerfile to correctly render the video in notebooks

0.0.15 (2023-Nov-01)

Changed defaults / behaviors

  • Updated readme by @fnikolai in #1196.
  • Removed unused import by @jieguangzhou in #1205.
  • Updated README.md with contributors by @thejumpman2323 in #1201.
  • Added conditional builders in Dockerfile by @fnikolai in #1213.
  • Optimized unit tests by @jieguangzhou in #1204.

New Features & Functionality

  • Updated README.md with announcement emoji by @thejumpman2323 in #1222.
  • Launched announcement by @fnikolai in #1208.
  • Added raw SQL in ibis by @thejumpman2323 in #1220.
  • Added experimental keyword by @fnikolai in #1218.
  • Added query table by @thejumpman2323 in #1212.
  • Merged Ashishpatel26 main by @blythed in #1224.
  • Bumped Version to 0.0.15 by @fnikolai in #1225.

Bug Fixes

  • Fixed dependencies and makefile by @fnikolai in #1209.
  • Fixed demo release by @fnikolai in #1210.

0.0.14 (2023-Oct-27)

0.0.13 (2023-Oct-19)

0.0.12 (2023-Oct-12)

0.0.11 (2023-Oct-10)

0.0.10 (2023-Oct-09)

0.0.9 (2023-Oct-06)

0.0.8 (2023-Sep-29)

0.0.7 (2023-Sep-14)

0.0.6 (2023-Aug-29)

0.0.5 (2023-Aug-15)

0.0.4 (2023-Aug-03)