Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded casefolds in LoomBackend.simulate_joint #618

Closed
fsaad opened this issue Mar 6, 2018 · 1 comment
Closed

Remove unneeded casefolds in LoomBackend.simulate_joint #618

fsaad opened this issue Mar 6, 2018 · 1 comment

Comments

@fsaad
Copy link
Collaborator

fsaad commented Mar 6, 2018

# Prepare the csv header and values.
csv_headers, csv_values = zip(*row.iteritems())
lower_to_upper = {str(a).lower(): str(a) for a in csv_headers}
csv_headers_str = [str(a).lower() for a in csv_headers]
csv_values_str = [str(a) for a in csv_values]
# Prepare streams for the server.
outfile = StringIO()
writer = loom.preql.CsvWriter(outfile, returns=outfile.getvalue)
reader = iter([csv_headers_str]+[csv_values_str])
# Obtain the prediction.
server._predict(reader, num_samples, writer, False)

Since the bayeslite schema uses COLLATE NOCSAE the return variable names are always lower case. Even if they were not, Loom will expect variables to be of the same that was used to create the models, which is the same capitalization that appears in the bayesdb table:

data_by_column = {}
for colno in bayesdb_variable_numbers(bdb, population_id, None):
column_name = bayesdb_variable_name(bdb, population_id, None, colno)
headers.append(column_name)
qt = sqlite3_quote_name(table)
qcn = sqlite3_quote_name(column_name)
cursor = bdb.sql_execute('SELECT %s FROM %s' % (qcn, qt))
col_data = [item for (item,) in cursor.fetchall()]
data.append(col_data)
data_by_column[column_name] = col_data
data = [list(i) for i in zip(*data)]
# Ingest data into loom.
schema_file = self._data_to_schema(bdb, population_id, data_by_column)
csv_file = self._data_to_csv(bdb, headers, data)
project_path = self._get_loom_project_path(bdb, generator_id)
loom.tasks.ingest(project_path, rows_csv=csv_file.name,
schema=schema_file.name)

@fsaad
Copy link
Collaborator Author

fsaad commented May 6, 2018

Since the bayeslite schema uses COLLATE NOCASE the return variable names are always lower case

Not sure where I obtained this bizarre statement from. It is false, the COLLATE NOCASE indicates to ignore case for comparison purposes, but stores the entry in whatever case was indicated in the INSERT INTO statement.

Nevertheless, the function bayesdb_variable_name always returns lower case variable names, since variable names in the bayesdb_variable table are always stored in lowercase by the interpreter (discussion #546):

bayeslite/src/bql.py

Lines 915 to 921 in 01638d2

# Insert variable records.
for nm, st in pop_all_vars:
name = casefold(nm)
stattype = casefold(st)
if stattype == 'ignore':
continue
core.bayesdb_add_variable(bdb, population_id, name, stattype)

So we should be able to remove the casefolds nevertheless.

@fsaad fsaad closed this as completed in 56d86b1 May 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant