Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api key framework #100

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added ocflib/api/__init__.py
Empty file.
124 changes: 124 additions & 0 deletions ocflib/api/keys.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from binascii import hexlify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put the api key code in ocflib? should this be in ocfweb instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken

idea was to have ocflib handle stateful operations and ocfweb just calls out to it when necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be a terrible idea to have the code that interacts with the database itself, e.g. key creation, be in ocflib and have ocfweb call out to it.

There's not really much state being carried here at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought the nice thing about ocfweb was that it wasn't tied to any databases and just used ocflib for all of its state-needing things, which does all of the database work? unless i am mistaken

I guess you could argue this, but it was never designed this way. I think the "database in a library" is a really bad pattern for a few reasons:

  • There is no single source of truth; you can't make schema migrations in a sane way, because everything can be running old versions of ocflib (think virtualenvs especially) which tries to perform the old queries on a new schema. As a general rule, I'd argue datastores should have one owner and that owner should provide a reliable interface for consumers which allows for backend changes.
  • The same argument for logic: let's say you change the length of API keys; ocfweb gets updated instantly but some other tool doesn't for a few days, and it's still generating the wrong API keys. If instead you make ocfweb the owner of the data and logic and provide an interface that things like CLIs can use, you both simplify the CLI code (and credential management!) and avoid buggy logic rollouts.
  • ocflib is used everywhere and it already has too many dependencies. Why does everything that needs ocfweb need to depend on pymysql, pysnmp, sqlalchemy, paramiko, etc.
  • Libraries can't do things like connection pooling, unlike applications; as a concrete example, we've had a couple cases of having to bump MySQL limits because we had too many connections from desktops using ocflib functions. They should've been hitting ocfweb API endpoints instead.
  • Even if you do add it to ocflib, it's not going to be useful to most consumers anyway: only ocfweb is going to care about API keys or have the credentials to deal with them. You still can't write a CLI using the functions because you don't have a secure way to get it access to that database (unless you're gonna use sudo/setuid like makemysql, which we hate).
  • Trying out your code gets harder; now you need to link your new ocflib against ocfweb so that you can actually see if you broke anything.

What is the benefit of adding another layer here? I think it just makes your lives harder in addition to the points above.

There's not really much state being carried here at all.

State isn't why I think we should avoid it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it seems I was mistaken on the point regarding ocfweb not being tied to any databases. For some reason I thought that this was intentionally done.

In that case, I believe the other points you made make sense, and it would be much easier to test this if it was just on ocfweb instead of ocflib.

If we were to place this in ocfweb instead, would we want to actually connect ocfweb to a mysql database in settings.py? And then for non-prod runs we would have it connect to like a sqlite instance instead of mysql so we don't have the problem of ocfweb being harder to start up. Just trying to gauge if that would be a good approach or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yeah. We don't have prior art for that yet, but I think that's the right approach. If we're going to keep writing raw SQL (rather than using sqlalchemy or the django ORM), we'd want to use mysql in dev too though. Maybe make a shared dev database and put creds in the ocfweb config file (settings.py would read this, we don't want to embed creds in settings.py itself).

from os import urandom

import pymsql


"""Constants
Changes here should be reflected in the corresponding keys.sql file if
necessary
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't actually a docstring /shrug

KEY_LENGTH = 32


def _generate_key(length=KEY_LENGTH):
"""Return a random `length` string to be used as a key
"""
return hexlify(urandom(length)).decode()


def get_key(cursor, user):
"""Get the key corresponding to the user in the db

Args:
cursor (pymysql.cursors.Cursor): database cursor
user (str): user to get the key for

Returns:
(str) the user's key, or None if there is no matching row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring format is different than what we use elsewhere in ocflib/ocfweb

"""
cursor.execute(
'SELECT `key`'
'FROM `keys`'
'WHERE `user` LIKE %s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIKE should probably be =, we don't want to do wildcard matching here or to be case insensitive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, should be exact for this

(user,),
)
query_result = cursor.fetchone()

try:
return query_result['key']
except (KeyError, TypeError):
# No matching row
return None

def get_user(cursor, key):
"""Get the user corresponding to the key in the db

Args:
cursor (pymysql.cursors.Cursor): database cursor
key (str): key that corresponds to the user

Returns:
(str) user that owns the key or None if there is no matching row
"""
cursor.execute(
'SELECT `user`'
'FROM `keys`'
'WHERE `key` LIKE %s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key should probably be hashed in the db? imo it'd be preferable to require api calls to provide both the user and the api key (you can combine them together if you want so that you use one Authorization header or whatever) and have the api keys hashed in the db. then we're not storing plaintext secrets in a database which can easily be abused in the event of an unnoticed leak. (you'll need to store the users in addition to the keys if you want to salt the hashes)

same comments here wrt docstring and LIKE

Copy link
Member

@Baisang Baisang Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some online research and thought also agree salt+hash is good

(key,),
)
query_result = cursor.fetchone()


try:
return query_result['user']
except (KeyError, TypeError):
# No matching row
return None

def key_exists(cursor, key):
"""Checks if the key currently exists in the db

Args:
cursor (pymysql.cursors.Cursor): database cursor
key (str): key to check for

Returns:
(bool) whether or not the key exists
"""
cursor.execute(
'SELECT 1'
'FROM `keys`'
'WHERE `key` LIKE %s',
(key,),
)
query_result = cursor.fetchone()

return query_result is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need this function?



def add_key(cursor, user):
"""Add a key for a user into the database

Args:
cursor (pymysql.cursors.Cursor): database cursor
user (str): user to add the key for

Returns:
None
"""
key = _generate_key()

# Re-generate key in case of collision
while key_exists(key):
key = _generate_key()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just make the key long, you'll never have a collision. i think this is slightly overengineered.


cursor.execute(
'INSERT INTO `keys`'
'(`key`, `user`)'
'VALUES (%s, %s, %s)',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two columns, three values?

(key, user)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't be able to launch this unless there's also a way to expire/revoke compromised or unused keys

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, this is just part 1 of a longer process



def get_connection(user, password, db='keys', **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db name should be prefixed with "ocf", maybe called "ocfapikeys" or something

"""Return a connection to MySQL."""
return pymysql.connect(
user=user,
password=password,
db=db,
host='mysql.ocf.berkeley.edu',
cursorclass=pymysql.cursors.DictCursor,
charset='utf8mb4',
**dict({'autocommit': True}, **kwargs)
)
6 changes: 6 additions & 0 deletions ocflib/api/keys.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
-- length of key defined by KEY_LENGTH in keys.py
CREATE TABLE IF NOT EXISTS `keys` (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this definitely needs more fields, e.g. an ID, creation/expiration time, revocation status, and if possible, what endpoints this grants access to, etc.

Copy link
Member

@Baisang Baisang Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what endpoints this grants access to

we have 0 endpoints currently (except like tv/hours i guess lmao) so we'd need to figure out a good format for storing this

`key` varchar(32) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is weird with your lengths. the keys you generate are 64 characters long since you hex'd them (even though you "only" have 32 bytes of random), either you'll need to make this column binary 32 bytes and convert back and forth, or make this longer.

`user` varchar(255) NOT NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is user meant to be unique? or are you allowing users to have multiple api keys?

PRIMARY KEY(`key`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd suggest also storing some data like time generated

)