Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trademark Search Support #64

Open
firmai opened this issue Aug 18, 2022 · 11 comments
Open

Trademark Search Support #64

firmai opened this issue Aug 18, 2022 · 11 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@firmai
Copy link

firmai commented Aug 18, 2022

No description provided.

@firmai
Copy link
Author

firmai commented Aug 18, 2022

Hi Parker, great tool, I would your professional opinion, I want to do something fairly simply but I have been hitting my head for 3 days now. I am looking for a means to search a brand or trademark like "Nike" and to then get a list of possible owners like "Nike, Inc". I can retrieve this information from the website https://tmsearch.uspto.gov/, but I have not idea what API to use 🤷

@parkerhancock
Copy link
Owner

Hey! Thanks for the note!

So, there is no API for that kind of thing, but you're correct that the closest USPTO service is TESS. So you'd need to automate manipulating TESS. I've done something similar for Pat/FT and App/FT on the patent side, which also both don't have API's. A good place to start is Freeform TESS searching.

I will say, I made a run years ago at including a TESS module in patent_client, and the persistent issue I had was cookie management. TESS was designed in ancient times (for the internet), so it has really obtuse ways of handling state that make it hard to wrap with a client library.

That said, I'm highly confident that it can be done. And would highly encourage you to make a go at it!

@parkerhancock parkerhancock added the enhancement New feature or request label Aug 18, 2022
@firmai
Copy link
Author

firmai commented Aug 18, 2022

Thanks a lot Parker, that explains why I see so few attempts in doing that online!

@mustberuss
Copy link

mustberuss commented Aug 18, 2022

The patent office has one but you'll need an API key https://developer.uspto.gov/api-catalog/tsdr-data-api I have one, so they aren't hard to get. They have a Swagger UI page for it but they didn't update it after they added the API key. I have an unsanctioned copy here https://mustberuss.github.io/TSDR-Swagger/ that I can't get them to adopt. It won't work online (CORS not allowed on their end) but the generated curl commands work, in case you want to kick the API while you are writing a manager for it. Or even better, my Swagger object can be imported into postman to give you a nicely loaded collection for the API https://mustberuss.github.io/TSDR-Swagger/myswagger_v1_tsdr_uspto.json

Also check out https://github.com/Ethan3600/USPTO-Trademark-API I contributed there a while back. I don't remember all the details but I don't think it needed an API key.

Oops, my bad, for these you need to already know a registration or serial number. Back to trying to scrape TESS...

@firmai
Copy link
Author

firmai commented Aug 19, 2022

Exactly, that makes it eternally tough.

Oops, my bad, for these you need to already know a registration or serial number. Back to trying to scrape TESS...

@mustberuss
Copy link

Not sure if anyone is working on this but I found a script here https://stackoverflow.com/a/43519721 that would do the initial search. Something similar could go against the freeform search page where the result size can be set up to 500 records. That page lists the codes for each field. My favorite trick is to and in a Registration Number not equal to 0 to limit the search to ones that have been registered
(Nike)[COMB] not (0)[RN]

It's still not clear to me how to produce a manager, model and schema though. I'll need to reread the developer doc.

@parkerhancock parkerhancock added the help wanted Extra attention is needed label Sep 13, 2022
@parkerhancock parkerhancock changed the title Trademarks Trademark Search Support Sep 13, 2022
@parkerhancock
Copy link
Owner

So, I was poking around on this one, and I think I have a solution. Similar to what I did over on Patent Public Search, I think I can bake the state management into the session object. So here we are - my scratch code for a TESS session that tracks the weird state object!

This does this with a few key features:

  1. The session handles the log in / log out semantics transparently. On the first request, it logs in, and when the object is destroyed, it logs out.

  2. The session spawns a "keep alive" thread that pings the homepage every 30 seconds to keep the session alive.

  3. Every response body is examined for an updated state string, which is continuously updated so the Session knows what the proper state is.

  4. Like the patent public search, it uses a template string {{state}} in URL's and parameters to allow the client application to refer to the state inside those parameters, without actually having to know what they are.

There's still a ways to go to make this a functioning part of Patent_Client, but I think this solves the hardest piece of the puzzle:

import re
import requests
import threading
from concurrent.futures import ThreadPoolExecutor


state_re = re.compile(r"state=(?P<session_id>\d{4}:\w+).(?P<query_id>\d+).(?P<record_id>\d+)")

def dict_replace(dictionary, text, replacement):
    # Recursively iterate through a dictionary
    # and replace all occurences of "text" with "replacement"
    # Used to put the session ID in request information
    if not dictionary:
        return dictionary
    for k, v in dictionary.items():
        if isinstance(v, dict):
            dictionary[k] = dict_replace(v, text, replacement)
        elif isinstance(v, str) and v == text:
            dictionary[k] = replacement
    return dictionary

class TessState():    
    def __init__(self, response_body):
        state = state_re.search(response_body).groupdict()
        self.session_id = state['session_id']
        self.query_id = state['query_id']
        self.record_id = state['record_id']
    
    def __str__(self):
        return f"{self.session_id}.{self.query_id}.{self.record_id}"

class TessSession(requests.Session):
    heartbeat_interval = 30
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.state = None
        self.state_lock = threading.Lock()
        self.keep_alive_thread = ThreadPoolExecutor(thread_name_prefix="TESS-Keep-Alive")
        self.headers['user-agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
        
    def request(self, *args, **kwargs):
        with self.state_lock:
            state = bool(self.state)
        if not state:
            self.login()
        with self.state_lock:
            if "url" in kwargs: # URL is passed as a keyword argument
                kwargs['url'] = kwargs['url'].replace("{{state}}", str(self.state))        
            if "params" in kwargs:
                kwargs['params'] = dict_replace(kwargs['params'], "{{state}}", str(self.state))
            if "data" in kwargs:
                kwargs['data'] = dict_replace(kwargs['data'], "{{state}}", str(self.state))
            if "json" in kwargs:
                kwargs['json'] = dict_replace(kwargs['json'], "{{state}}", str(self.state))
        
        response = super().request(*args, **kwargs)
        if state_re.search(response.text):
            with self.state_lock:
                self.state = TessState(response.text)
        return response
        
    
    def login(self):
        login_response = super().request("get", "https://tmsearch.uspto.gov/bin/gate.exe", params={"f": "login", "p_lang": "english", "p_d": "trmk"})
        with self.state_lock:
            self.state = TessState(login_response.text)
            print(f"Logged in! Current State: {self.state}")
        # Kill the existing keep-alive thread
        self.keep_alive_thread.shutdown(wait=False, cancel_futures=True)
        # Create a new keep-alive thread
        self.keep_alive_thread = ThreadPoolExecutor(thread_name_prefix="TESS-Keep-Alive")
        self.keep_alive_thread.submit(self.keep_alive)
        
    def keep_alive(self):
        while True:
            with self.state_lock:
                response = super().request("get", "https://tmsearch.uspto.gov/bin/gate.exe", params={"f": "tess", "state": str(state)})
            time.sleep(self.heartbeat_interval)
            
    def __del__(self):
        self.keep_alive_thread.shutdown(wait=False, cancel_futures=True)
        super().request("post", "https://tmsearch.uspto.gov/bin/gate.exe", data={"state": str(state), "f": "logout", "a_logout": "Logout"})
                
        

@parkerhancock
Copy link
Owner

#70 is a PR to track the work on this. I've abandoned using the separate keep-alive thread for TESS, mostly because it makes testability a total nightmare.

Instead, I'm trying to preserve all the necessary state to recreate any given result, so if the session expires, all necessary steps can be "replayed" to get back to the same spot.

The biggest issue now is getting to individual TESS records. The way that TESS links to them is using the state object, in the form {session_id}.{query_id}.{record_id}, so if the session ID ever expires, the only way to get back to a record is to replay the request, and fetch the matching record ID. Which is a pain. In the context of patent client, I think that means that every search result needs to have a bit of metadata with the original query, so that if the related TESS record needs to be fetched, the query can be repeated.

Fun times!

@mustberuss
Copy link

Looks like we'll have some time to figure this out. I just attended an advanced trademark searching webinar where they mentioned TESS will be replaced in about a year. (The slides are here and the recording will be posted in a few weeks. Learned a few things about regex like searches etc.)

@elvinagam
Copy link

so, at the end of the day, do we have anything that does a simple trademark search, takes text as input and returns a list of trademarks on this name?

@firmai
Copy link
Author

firmai commented Jan 27, 2024

Nope not yet, and if I am wrong please scream as loud as you can, this could save me hours on a monthly basis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants