Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api key framework #100

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Api key framework #100

wants to merge 3 commits into from

Conversation

c2tonyc2
Copy link

@c2tonyc2 c2tonyc2 commented Nov 7, 2017

Initial Framework for API key-based auth system

This is a rough draft, would love to hear any concerns with the proposed schema.
Also are there any security concerns with storing keys in plain tied to users?

API for functions in keys.py could also use critique for signature or etc.

@Baisang
@douglaslwong

"""Constants
Changes here should be reflected in the corresponding keys.sql file if
necessary
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't actually a docstring /shrug

user (str): user to get the key for

Returns:
(str) the user's key, or None if there is no matching row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring format is different than what we use elsewhere in ocflib/ocfweb

cursor.execute(
'SELECT `key`'
'FROM `keys`'
'WHERE `user` LIKE %s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIKE should probably be =, we don't want to do wildcard matching here or to be case insensitive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, should be exact for this

cursor.execute(
'SELECT `user`'
'FROM `keys`'
'WHERE `key` LIKE %s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key should probably be hashed in the db? imo it'd be preferable to require api calls to provide both the user and the api key (you can combine them together if you want so that you use one Authorization header or whatever) and have the api keys hashed in the db. then we're not storing plaintext secrets in a database which can easily be abused in the event of an unnoticed leak. (you'll need to store the users in addition to the keys if you want to salt the hashes)

same comments here wrt docstring and LIKE

Copy link
Member

@Baisang Baisang Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some online research and thought also agree salt+hash is good


# Re-generate key in case of collision
while key_exists(key):
key = _generate_key()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just make the key long, you'll never have a collision. i think this is slightly overengineered.

-- length of key defined by KEY_LENGTH in keys.py
CREATE TABLE IF NOT EXISTS `keys` (
`key` varchar(32) NOT NULL,
`user` varchar(255) NOT NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is user meant to be unique? or are you allowing users to have multiple api keys?

@@ -0,0 +1,6 @@
-- length of key defined by KEY_LENGTH in keys.py
CREATE TABLE IF NOT EXISTS `keys` (
`key` varchar(32) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is weird with your lengths. the keys you generate are 64 characters long since you hex'd them (even though you "only" have 32 bytes of random), either you'll need to make this column binary 32 bytes and convert back and forth, or make this longer.

'(`key`, `user`)'
'VALUES (%s, %s, %s)',
(key, user)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't be able to launch this unless there's also a way to expire/revoke compromised or unused keys

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, this is just part 1 of a longer process

CREATE TABLE IF NOT EXISTS `keys` (
`key` varchar(32) NOT NULL,
`user` varchar(255) NOT NULL
PRIMARY KEY(`key`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd suggest also storing some data like time generated

@@ -0,0 +1,124 @@
from binascii import hexlify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put the api key code in ocflib? should this be in ocfweb instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken

idea was to have ocflib handle stateful operations and ocfweb just calls out to it when necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be a terrible idea to have the code that interacts with the database itself, e.g. key creation, be in ocflib and have ocfweb call out to it.

There's not really much state being carried here at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken

I guess you could argue this, but it was never designed this way. I think the "database in a library" is a really bad pattern for a few reasons:

  • There is no single source of truth; you can't make schema migrations in a sane way, because everything can be running old versions of ocflib (think virtualenvs especially) which tries to perform the old queries on a new schema. As a general rule, I'd argue datastores should have one owner and that owner should provide a reliable interface for consumers which allows for backend changes.
  • The same argument for logic: let's say you change the length of API keys; ocfweb gets updated instantly but some other tool doesn't for a few days, and it's still generating the wrong API keys. If instead you make ocfweb the owner of the data and logic and provide an interface that things like CLIs can use, you both simplify the CLI code (and credential management!) and avoid buggy logic rollouts.
  • ocflib is used everywhere and it already has too many dependencies. Why does everything that needs ocfweb need to depend on pymysql, pysnmp, sqlalchemy, paramiko, etc.
  • Libraries can't do things like connection pooling, unlike applications; as a concrete example, we've had a couple cases of having to bump MySQL limits because we had too many connections from desktops using ocflib functions. They should've been hitting ocfweb API endpoints instead.
  • Even if you do add it to ocflib, it's not going to be useful to most consumers anyway: only ocfweb is going to care about API keys or have the credentials to deal with them. You still can't write a CLI using the functions because you don't have a secure way to get it access to that database (unless you're gonna use sudo/setuid like makemysql, which we hate).
  • Trying out your code gets harder; now you need to link your new ocflib against ocfweb so that you can actually see if you broke anything.

What is the benefit of adding another layer here? I think it just makes your lives harder in addition to the points above.

There's not really much state being carried here at all.

State isn't why I think we should avoid it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it seems I was mistaken on the point regarding ocfweb not being tied to any databases. For some reason I thought that this was intentionally done.

In that case, I believe the other points you made make sense, and it would be much easier to test this if it was just on ocfweb instead of ocflib.

If we were to place this in ocfweb instead, would we want to actually connect ocfweb to a mysql database in settings.py? And then for non-prod runs we would have it connect to like a sqlite instance instead of mysql so we don't have the problem of ocfweb being harder to start up. Just trying to gauge if that would be a good approach or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yeah. We don't have prior art for that yet, but I think that's the right approach. If we're going to keep writing raw SQL (rather than using sqlalchemy or the django ORM), we'd want to use mysql in dev too though. Maybe make a shared dev database and put creds in the ocfweb config file (settings.py would read this, we don't want to embed creds in settings.py itself).

@@ -0,0 +1,6 @@
-- length of key defined by KEY_LENGTH in keys.py
CREATE TABLE IF NOT EXISTS `keys` (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this definitely needs more fields, e.g. an ID, creation/expiration time, revocation status, and if possible, what endpoints this grants access to, etc.

Copy link
Member

@Baisang Baisang Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what endpoints this grants access to

we have 0 endpoints currently (except like tv/hours i guess lmao) so we'd need to figure out a good format for storing this

@@ -0,0 +1,124 @@
from binascii import hexlify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be a terrible idea to have the code that interacts with the database itself, e.g. key creation, be in ocflib and have ocfweb call out to it.

There's not really much state being carried here at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants