Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON return format of the mobile judgments #10

Open
lintool opened this issue Jun 21, 2017 · 6 comments
Open

JSON return format of the mobile judgments #10

lintool opened this issue Jun 21, 2017 · 6 comments

Comments

@lintool
Copy link
Member

lintool commented Jun 21, 2017

In the initial implementation, @salman1993 's format for participating teams getting back judgments from mobile assessors is:

[{"topid":"jimmy1","tweetid":"875784501338660865","rel":1},
{"topid":"jimmy1","tweetid":"876849256283459584","rel":2}]

A few requested changes:

  • let's have 0 = non-relevant, 1 = redundant, and 2 = relevant. Currently, 1 and 2 are swapped - yes, I know this makes it inconsistent from previous tracks, but this makes more sense.
  • can we have pushed time and assessed time in the output?

Point of discussion:

In the current design, each participating team issues a REST API call to fetch all assessments for all topics. This means that each participating team must keep track of what assessments are new, i.e., since the last time they called the API. This is a deliberate decision since keeping state on our end would add an additional layer of complexity.

The downside, of course, is that the response JSON gets bigger and bigger as judgments accumulate...

@KaranSabhnani
Copy link

I would prefer consistency in relevance annotations.

And having the API just return the new set of judgements would be better, but if adding that feature is a lot of work, then we can certainly take care of that at our end. Response JSON certainly would get bigger and bigger over days, but based on last year's statistics, the current format could have a list with as many as 13-15K objects on the last day. That's not too big to filter in real-time.

My suggestion would be to make the response JSON a bit flexible to support querying by topicID, i.e. to change the format to:

{"jimmy1":
[{"tweetid":"875784501338660865","rel":1},{"tweetid":"876849256283459584","rel":2}],
"jimmy2":
[{"tweetid":"755784596338660255","rel":1},{"tweetid":"458949256283452584","rel":0}]}

@aroegies
Copy link
Member

The bigger issue is that we'd likely end up with two endpoints. One for a massive bulk return of all judged tweets (submitted by that team) and one for the incremental update.

The former remains necessary to allow participants to recover from potential catastrophic system failure (e.g., brown out, program crashes, grad student crashes the machine).

Given my understanding of the underlying architecture, the incremental update should not require too much heavy lifting but I'm also not going to promise something that someone else has to deliver.

@lintool
Copy link
Member Author

lintool commented Jun 21, 2017

@KaranSabhnani I'd rather our endpoints be stateless, instead of us keeping a "pointer" of "last consumed". For example, what if the consumer (i.e., participant team) pulled their most recent judgments, but then forgot to store the judgments? Or their machine crashed? Then we'd need an API to "rewind" the pointer... or what @aroegies suggested above.

One middle ground might be to add an extra param in the REST API that's a timestamp - i.e., return everything after this timestamp. That would be an easy filter on our end.

@KaranSabhnani
Copy link

@lintool based on your concerns and things that @aroegies pointed out, it makes sense to keep the API endpoint stateless. And I like the idea of filtering by timestamp. Diving into the details of it, what would this timestamp be? Would it be the timestamp of the last judged tweet that the api returned to the participant or would it be the timestamp when the participant last fetched for judgements or something else?

@lintool
Copy link
Member Author

lintool commented Jun 21, 2017

@KaranSabhnani the timestamp would be whatever the participating team wanted. The common use case would be give me everything newer than t, where t is the last time I called the API.

@KaranSabhnani
Copy link

Sounds good. And what about the format? Are we sticking to the initial format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants