Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Check if table already exists before creating it #158

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lopuhin
Copy link
Member

@lopuhin lopuhin commented Jun 18, 2016

If we have SQLALCHEMYBACKEND_DROP_ALL_TABLES = False, then without this fix we will fail to start db and strategy workers with an existing database, because the tables are already there, but we try to create them.

Potentially, creating the table without checking for existance could cause troubles even with SQLALCHEMYBACKEND_DROP_ALL_TABLES = True, when several workers would try to create required tables concurrently, but I never tried to reproduce this.

If we have SQLALCHEMYBACKEND_DROP_ALL_TABLES = False, then
without this fix we will fail to start db and strategy workers
with an existing database.
@lopuhin
Copy link
Member Author

lopuhin commented Jun 18, 2016

This check is not enough to prevent errors during concurrent table creation, at least with postgres. Although this looks more like a sqlalchemy bug:

db-worker-incoming-1_1  | Traceback (most recent call last):
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
db-worker-incoming-1_1  |     cursor.execute(statement, parameters)
db-worker-incoming-1_1  | psycopg2.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
db-worker-incoming-1_1  | DETAIL:  Key (typname, typnamespace)=(queue_id_seq, 2200) already exists.
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | The above exception was the direct cause of the following exception:
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | Traceback (most recent call last):
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/runpy.py", line 170, in _run_module_as_main
db-worker-incoming-1_1  |     "__main__", mod_spec)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/runpy.py", line 85, in _run_code
db-worker-incoming-1_1  |     exec(code, run_globals)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/worker/db.py", line 280, in <module>
db-worker-incoming-1_1  |     worker = DBWorker(settings, args.no_batches, args.no_incoming)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/worker/db.py", line 74, in __init__
db-worker-incoming-1_1  |     self._manager = FrontierManager.from_settings(settings, db_worker=True)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 282, in from_settings
db-worker-incoming-1_1  |     strategy_worker=strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 243, in __init__
db-worker-incoming-1_1  |     strategy_worker=strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 24, in __init__
db-worker-incoming-1_1  |     self._backend = self._load_backend(backend, db_worker, strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 54, in _load_backend
db-worker-incoming-1_1  |     return cls.db_worker(self)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/contrib/backends/sqlalchemy/__init__.py", line 162, in db_worker
db-worker-incoming-1_1  |     queue_m.__table__.create(bind=b.engine, checkfirst=True)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/schema.py", line 725, in create
db-worker-incoming-1_1  |     checkfirst=checkfirst)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1856, in _run_visitor
db-worker-incoming-1_1  |     conn._run_visitor(visitorcallable, element, **kwargs)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1481, in _run_visitor
db-worker-incoming-1_1  |     **kwargs).traverse_single(element)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
db-worker-incoming-1_1  |     return meth(obj, **kw)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 764, in visit_table
db-worker-incoming-1_1  |     include_foreign_key_constraints=include_foreign_key_constraints
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 914, in execute
db-worker-incoming-1_1  |     return meth(self, multiparams, params)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
db-worker-incoming-1_1  |     return connection._execute_ddl(self, multiparams, params)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 968, in _execute_ddl
db-worker-incoming-1_1  |     compiled
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
db-worker-incoming-1_1  |     exc_info
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
db-worker-incoming-1_1  |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 185, in reraise
db-worker-incoming-1_1  |     raise value.with_traceback(tb)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
db-worker-incoming-1_1  |     cursor.execute(statement, parameters)
db-worker-incoming-1_1  | sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
db-worker-incoming-1_1  | DETAIL:  Key (typname, typnamespace)=(queue_id_seq, 2200) already exists.
db-worker-incoming-1_1  |  [SQL: '\nCREATE TABLE queue (\n\tid SERIAL NOT NULL, \n\tpartition_id INTEGER, \n\tscore FLOAT, \n\turl VARCHAR(1024) NOT NULL, \n\tfingerprint VARCHAR(40) NOT NULL, \n\thost_crc32 BIGINT NOT NULL, \n\tmeta BYTEA, \n\theaders BYTEA, \n\tcookies BYTEA, \n\tmethod VARCHAR(6), \n\tcreated_at BIGINT, \n\tdepth SMALLINT, \n\tPRIMARY KEY (id)\n)\n\n']

@sibiryakov
Copy link
Member

@lopuhin concurrent table creation is something we should avoid doing. Such a behavior isn't expected by both DB servers and clients. I recommend to redesign your application to avoid doing this.

Concurrent truncation of the table should work fine, btw.

@@ -132,7 +132,7 @@ def strategy_worker(cls, manager):
if drop_all_tables:
if model.__table__.name in inspector.get_table_names():
model.__table__.drop(bind=b.engine)
model.__table__.create(bind=b.engine)
model.__table__.create(bind=b.engine, checkfirst=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening because SQLA has non-unified behavior between various engines. For example that code works well for sqlite, without need of checkfirst=True. However, I still think it makes sense to merge it.

@lopuhin
Copy link
Member Author

lopuhin commented Jun 20, 2016

@sibiryakov thanks, I'll try to avoid concurrent table creation, that makes sense!

@sibiryakov
Copy link
Member

I would like to test that behavior. What if we'll be testing our backends with SQLALCHEMYBACKEND_DROP_ALL_TABLES enabled?

@lopuhin
Copy link
Member Author

lopuhin commented Jun 20, 2016

Yep, I'll add the tests (found them now).

@lopuhin lopuhin changed the title Check if table already exists before creating it WIP: Check if table already exists before creating it Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants