-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Population variables lose case distinctions #546
Comments
Just hit this issue as a bug proper.
And the failure happens here:
Also notice duplicate of |
Actually, what broke this behavior was @riastradh-probcomp's fix d7c5b3f to this issue, which appears to have caused the regression. Update: git revert d7c5b3f indeed fixed the issue with Latent variables reported above. Surprised why nothing in the test suite failed after the commit in question (probably because all cases were lower uniformly). |
@riastradh-probcomp I'm going to revert this commit --- it has completely broken the create schema logic in https://github.com/probcomp/bayeslite/blob/master/src/metamodels/cgpm_metamodel.py#L1043, and refactoring that complex state machine to is likely to be more effort and introduce more bugs than updating downstream clients to either use the variable name from the base table and/or do case insensitive comparisons. |
This state machine already appears to be problematic from a cursory glance. But it doesn't look like it needs refactoring -- it looks like it just needs a few casefolds inserted judiciously: replace |
Here's an example I expect to fail with the state machine as is:
|
I'm running against this problem too. I want to do a JOIN of a dependence probability table and a table read from a CSV file, and the cases aren't matching. If there's some reason to keep this (mis)feature, maybe it could be a CREATE POPULATION option, so you could decide at that point whether you want folding or not? |
Date: Tue, 29 Aug 2017 19:29:28 +0000 (UTC)
From: Jonathan Rees probably ***@***.***>
I'm running against this problem too. I want to do a JOIN of a
dependence probability table and a table read from a CSV file, and
the cases aren't matching. If there's some reason to keep this
(mis)feature, maybe it could be a CREATE POPULATION option, so you
could decide at that point whether you want folding or not?
I don't think adding more configuration knobs is a good idea.
It seems to me:
(a) The table should retain the case of the CREATE TABLE columns.
(b) The population should retain the case of the CREATE POPULATION variables.
(c) Everything in bayeslite should be identified case-insensitively.
Since the implementation of JOIN is deferred to sqlite3, I am
surprised that you are hitting anything about mismatched cases,
because all names in sqlite3 are identified case-insensitively.
What reproduces the symptom you're referring to?
|
I think I made an assumption, because it just didn't occur to me that sqlite3 would do case insensitive comparison. I was seeing a very slow JOIN and thought this was a likely cause. But if what you say it true, then it has a different cause. Is = case insensitive as well? |
Date: Tue, 29 Aug 2017 12:50:17 -0700
From: Jonathan Rees probably ***@***.***>
I think I made an assumption, because it just didn't occur to me
that sqlite3 would do case insensitive comparison. I was seeing a
very slow JOIN and thought this was a likely cause. But if what you
say it true, then it has a different cause.
Is = case insensitive as well?
On columns declared COLLATE NOCASE, yes. The semantics is described
at <https://www.sqlite.org/datatype3.html#collation>. The default is
COLLATE BINARY, which is case-sensitive. In the bayeslite metadata
tables, e.g. bayesdb_population, names (population names, variable
names, &c.) are stored with COLLATE NOCASE -- see src/schema.py for
details.
|
This isn't itself quite a bug, but downstream things sometimes expect the case to match from queries, where sqlite3 preserves the original case of the variable in the underlying table, and from the results of
bayesdb_variable_names
.We should fix everything downstream to do case-insensitive comparisons (e.g.: probcomp/iventure#38), but until then, we can work around those mistakes by using the original case in the
bayesdb_variable
table. Won't help with existing bdb files, but will suppress downstream mistakes for any new bdb files.The text was updated successfully, but these errors were encountered: