Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Base] design suggestion for Search-Service #3

Open
rajatkb opened this issue Feb 22, 2020 · 5 comments
Open

[Base] design suggestion for Search-Service #3

rajatkb opened this issue Feb 22, 2020 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers gssoc20 GSSOC label for gscco20 tag medium GSSOC label for beginner tag

Comments

@rajatkb
Copy link
Owner

rajatkb commented Feb 22, 2020

The search service needs to cater to user searching information over the data stored by the Scrapper-Service.
** REQUIREMENTS **

  • Should be able to expose simple search API over Elastic Search
  • Would be great if the solution could involve something that minimizes the data to be stored and indexed in Elastic Search but still give relevant results.
@rajatkb rajatkb added enhancement New feature or request good first issue Good for newcomers beginner GSSOC label for beginner tag labels Feb 22, 2020
@rajatkb rajatkb added medium GSSOC label for beginner tag gssoc20 GSSOC label for gscco20 tag and removed beginner GSSOC label for beginner tag labels Feb 22, 2020
@rajatkb rajatkb changed the title Design suggestion for Search-Service [Base] design suggestion for Search-Service Feb 28, 2020
@vipuldcoder
Copy link

I am interested with the issue.

@secretshardul
Copy link

So here are the different alternatives

  1. Use MongoDB connector: Transports data from Mongo to ES. Will need 3 servers(Mongo, ES, MongoDB connector).
  2. Use Logstash for ingestion: Express makes API calls to Logstash which in turn pushes data to ES. Needs 3 servers. You can make express directly write into ES but thats not a good practice.
  3. Use ElasticSearch as database and remove MongoDB: ES itself is a NoSQL database. For searching content, you will have to load all data into it and make an equal number of API calls anyway. Just a single server to maintain.
  4. Use Algolia: Fully managed search. Removes hassle of managing a search server. Algolia provides free service for open source projects. Medium.com and *Stripe use algolia.
  5. MongoDB Atlas Search: Integrates search feature in MongoDB itself.

I'd say go with option 3,4 or 5. Also ElasticSearch maintenance is a different beast altogether. Try using AWS Elasticsearch or official Elasticsearch hosting rather than rolling out your own ES server. They handle server management for you.

P.S: Extensively worked with ES before.

@rajatkb
Copy link
Owner Author

rajatkb commented Apr 5, 2020

  • Some of the operations require some standard numerical operations and other custom ops which Mongo seems to better suited for use case. ES can slow down the scrapping process (rapid insertion) . As you already mentioned having ES managed is a beast of task itself. That means handling indexes for the inserted data from scrapper, for such insertion would be another hassle.

  • I have looked into algolia , but having that would require them sponsoring this project.

  • I did considered the MongoDb text search functionality, problem is it doesn't supports partial text as much as I know. ES does a better job with it. Each conference data has a bulk text attached with it describing the conference, the ES can work on that.

@secretshardul
Copy link

Should go for Mongodb connector then. If you plan to use AWS, a good alternative can also be as

  1. Push scraped data in .json format into S3.
  2. Use Kinesis Firehose(an AWS service for streams) or Logstash(will have to host server) to ingest data into ES

@rajatkb
Copy link
Owner Author

rajatkb commented Apr 5, 2020

  • Yup can do. for the ES service , I have not planned the deployment as of yet given being I am trying to avoid paid services. The actual application will be deployed with a Heroku+Atlas Mongo combination. I am yet to be sure about the ES service.

  • The mongo connector option seems to be the most resilient one. I suppose it takes care of consistency of data between mongodb and ES.

  • Also would advice to talk to the mentor looking into the ES side of the project as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers gssoc20 GSSOC label for gscco20 tag medium GSSOC label for beginner tag
Projects
None yet
Development

No branches or pull requests

3 participants