-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ebe0f3d
commit b061358
Showing
58 changed files
with
646 additions
and
177 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
173 changes: 173 additions & 0 deletions
173
docs/about-me/projects/70-interview-datainsight-alerting-engine.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# Interview - Datainsight / Alerting Engine | ||
|
||
## Data Insight | ||
|
||
Your company is developing a new cloud-based analytics platform, "DataInsight," designed to provide advanced data processing and visualization capabilities for enterprise clients. The platform integrates with various data sources (telemetry metrics) utilizes machine learning for predictive analytics, and offers a highly interactive and customizable dashboard for end-users. | ||
|
||
### Technical architecture | ||
- Ingestion to database | ||
- Database to services (visualization and analytics including ML) | ||
|
||
### Integration Setbacks (Technical Problem) | ||
|
||
The platform struggles to integrate smoothly with a | ||
- wide range of customer databases especially legacy systems | ||
- causing delays | ||
- data inconsistency issues. | ||
- Common set of communication protocols from client sensors too | ||
|
||
#### Assumption | ||
- We have standardized data at our end in Influxdb | ||
- Clients (50-5000) - | ||
- Branches / Locations - (less - 500-2,50,000) | ||
- Sensors - measurement (500 sensors, millions) | ||
- Table schema | ||
- (row oriented, column oriented, non-normalized) | ||
- Clients | ||
- client_id pk | ||
- Locations | ||
- client fk | ||
- location_id pk | ||
- Raw Sensor Data | ||
- location_id fk | ||
- sensor_type_fk - fk | ||
- values | ||
- timestamp | ||
- Sensor Type | ||
- id | ||
- sensor_id - 5 | ||
- parameter - 1 (temp) | ||
- metadata - temp (5, 1), humidity (5, 2) | ||
- pk - (sensor_id + parameter) | ||
- Query | ||
- Aggregate - last week average | ||
- 15-16 values | ||
- Device (not a table) - AC | ||
- What temp i am set at | ||
- What current and voltage i am consuming | ||
- Multi-tenant / single-tenant | ||
- multi-tenant system | ||
|
||
#### Problems | ||
- wide range of customer databases | ||
|
||
#### Solutions | ||
- Standard set of Ingestion APIs, so integration should be simpler | ||
- Producers can be standardized | ||
- Serialization / deserialization | ||
|
||
### Technological Hurdles (Technical Problem) | ||
|
||
The machine learning models are not performing as expected, leading to inaccurate predictions and inefficient data processing. The cause is unclear, whether it's due to poor model selection, inadequate training data, or something else. | ||
|
||
#### Visualization (end to end from databases to visualization service) | ||
|
||
##### Assumption | ||
1. Sensor | ||
|
||
##### Tech architecture | ||
|
||
- Database | ||
- Backend service | ||
- APIs for highcharts that will give the sensor data | ||
- Batch reporting apis | ||
- Frontend service - show users the end data | ||
- IAM | ||
- React / Angular - Client side rendered | ||
- Client side app | ||
- Don't need to send containers all the time | ||
- Responsive - Desktop / Mobile | ||
- Visualization library - highcharts | ||
|
||
##### Questions | ||
1. How much data (10 years, 1 years) | ||
1. Roll ups | ||
2. Accuracy of old data | ||
3. Data tiering | ||
4. Query speed | ||
1. Batch Reporting - for downloading large datasets, export | ||
1. Add queueing and do asynchronous processing | ||
2. Wait for data, either in mail, or downloads page | ||
2. **Real time analytics** | ||
|
||
## Low Latency Alerting System | ||
|
||
Time series data is being published from loT devices(MQTT) across the building. There are various data types like efficiency, energy consumption, equipment on/off, etc. | ||
|
||
The user wants different categories of alerts like critical, high, medium, and low. | ||
|
||
- Critical - need to be initiated instantly. (within milliseconds) | ||
- High - need to be initiated within 10 seconds to 1 minute. | ||
- Medium - need to be initiated within 5 minutes - 6 hours. | ||
- Low- can be sent up to once a day or week. | ||
|
||
Whenever an alerting condition is met, it should trigger an alert and send it to all subscribed users via their preferred mode in the given below template- | ||
|
||
- Alert Summary | ||
- Timestamp | ||
- Priority | ||
- Asset Name | ||
- Suggested Action For e.g Most likely Automation commands are not working on the chiller, Please switch the chiller to JouleTrack mode, turn it on, and then return it to recipe mode. | ||
|
||
What all different system components can be brought together to send user alerts on different mediums (push notification, whatsapp, email, message)? | ||
|
||
Instruction | ||
|
||
1. Please create a component level diagram to discussion the solution design | ||
2. Solution should be low latency, fault-tolerance and distributed in nature. It can be easily scaled to serve millions of monthly alerts to users. | ||
3. Data modeling diagram to represent how things are connected to each other and how data is flowing. | ||
4. Trade Off decisions should be highlighted. | ||
|
||
### Questions | ||
|
||
1. size of data? | ||
2. Rate of stream? - 10K per second | ||
|
||
### Solution | ||
|
||
- Alert mapping table (in memory or redis) | ||
- Subscribe to mqtt topics | ||
- Log to alerts too - rdms | ||
- Async processing | ||
- Better processing | ||
- Kafka streams | ||
- Apache Pulsar | ||
- Druid | ||
- Consumer Lag | ||
|
||
## Architecture | ||
|
||
![mqtt-alerting-engine](../../media/MQTT%20Alerting%20Engine.drawio.png) | ||
|
||
- Alert mapping table - Main copy in RDBMS | ||
- Pushed copy in redis | ||
- Updates pushed to redis from RDMS whenever changed | ||
- Backend service - mqtt broker to redis streams conversion for creating consumer group | ||
- Redis streams (single topic) | ||
- Backend service - alert processing module | ||
- Process the packet | ||
- match it with alert mapping table | ||
- push to redis streams communication channel | ||
- Backend service - alerting engine | ||
- Log the incoming alert | ||
- Call the communications api (based on alert mapping, or alert processing module can **send the channel too**) | ||
- SMS | ||
- Push | ||
- Telegram, etc | ||
- Can send multiple customers | ||
- Webhook for delivery status | ||
- Scaling ways | ||
- Multithreading | ||
- Asyncio | ||
- Horizontal Scaling | ||
|
||
##### Followups | ||
- Duplicate | ||
- Automatic resolutions | ||
- Fire forget | ||
|
||
### Alerting Exceptions Handling | ||
|
||
![alerting-exceptions-handling-flow](../../media/Communication%20exception%20flow.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Stashfin Customer Support | ||
|
||
### Improvements | ||
|
||
#### Proactive Communication | ||
|
||
- Implement timely notifications for users about upcoming payments, EMIs, and system maintenance. | ||
- Provide clear and transparent information on payment breakdowns and costs. | ||
|
||
#### Enhanced User Interface | ||
|
||
- Redesign the online platform for a user-friendly experience. | ||
- Include easy-to-use payment options and troubleshoot common issues. | ||
|
||
#### Real-time Issue Resolution | ||
|
||
- Introduce a real-time customer support chat for immediate issue resolution. (YM) | ||
- Establish a dedicated support team to address payment-related concerns promptly. | ||
|
||
#### Education and Support | ||
|
||
- Develop user guides and FAQs to educate customers about the lending platform. | ||
- Offer online tutorials and resources to guide users through the payment process. | ||
|
||
#### Others | ||
|
||
- Added Hindi language support | ||
- CSAT of agents | ||
- Moved from fixed pricing for CS agents to dynamic pricing where based on CSAT, payments will be done | ||
- Canned responses for CS agents for increased productivity | ||
|
||
### KPIs | ||
|
||
- Total tickets | ||
- Number of unique customers | ||
- **Tickets > 24 hours** | ||
- Ticket Resolution TAT | ||
- Ticket Closed TAT | ||
- Ticket First Contact TAT | ||
- Chat First Contact TAT | ||
- **Ticket Reopen Count** | ||
- **Ticket Reopen Percent** | ||
- Play Store Rating | ||
- Quality Audit Score | ||
- RBI Escalation | ||
|
||
### Creating and maintaining product roadmaps | ||
|
||
![product-roadmap-example](../../media/Pasted%20image%2020231201183958.png) |
58 changes: 58 additions & 0 deletions
58
docs/about-me/projects/87-stashfin-team-management-culture.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Stashfin Team Management / Culture | ||
|
||
### Hiring | ||
|
||
- Automated the hiring process | ||
- Hired around 12 freshers and 5 senior developers | ||
- Processes around onboarding and automated team pulses | ||
- Onboarding tasks + manager and team introductions + onboarding buddy | ||
|
||
### Culture | ||
|
||
- Processes around reviews, appraisals and feedbacks | ||
- Project management tool - first gitlab issue boards, then moved to JIRA | ||
- Daily standups + scrums | ||
- Documentation | ||
|
||
### Mandatory Code Reviews | ||
|
||
- Atleast 2 approvals, one from senior dev and one from junior dev is mandatory for merging the code | ||
- Using Git and a proper PR process. Every feature or bug fix is a separate branch and submitted as a PR | ||
|
||
![stashfin-git-review-process](../../media/Pasted%20image%2020231201181214.png) | ||
|
||
### Scrum / Kanban / Project Management | ||
|
||
- Implemented Agile project management methodology across teams | ||
|
||
![example-scrum-board](../../media/Pasted%20image%2020231201181414.png) | ||
|
||
## Documentation | ||
|
||
- Used a combination of google docs with team folders, etc | ||
- Introduced confluence for documentation | ||
|
||
![example-confluence-documentation](../../media/Pasted%20image%2020231201181347.png) | ||
|
||
### Process process for documentation | ||
|
||
#### ADRs (Architecture Design Records) and HLD (High Level Diagrams) | ||
|
||
- ADRs are documents that capture the important decisions regarding the architecture of our software. They serve as a record of the context, options considered, and the rationale behind the chosen solution. | ||
- HLD provides an overview of the system architecture, major components, and their interactions. It helps in aligning the team and stakeholders on the overall structure of the application. | ||
- We should update the HLD whenever there are significant changes to the system architecture. It serves as a reference for new team members and ensures everyone has a shared understanding of the system. | ||
|
||
![high-level-diagram-example](../../media/Pasted%20image%2020231201183011.png) | ||
|
||
#### LLD (Low Level Diagrams) and ER (Entity Relationship Diagrams) | ||
|
||
- LLD dives into the details of individual components or modules. It includes class diagrams, data flow diagrams, and other specifics that guide the implementation. | ||
- LLD documents are often created in collaboration with the development team. They serve as a valuable resource during the implementation phase and aid in code reviews. | ||
- ERDs visually represent the relationships between entities in our database. They are crucial for understanding the data model and ensuring that it aligns with the requirements. | ||
- Each table and its relationships are clearly defined in the ERD, making it easier for developers, database administrators, and stakeholders to comprehend the data structure. | ||
- We include ERDs as part of our documentation to maintain a clear understanding of the database schema. This is especially helpful during database migrations or when onboarding new team members. | ||
- During code reviews or discussions about database changes, referring to the ERD ensures that everyone is on the same page regarding the data model. | ||
|
||
![low-level-diagram-example](../../media/Pasted%20image%2020231201183115.png) | ||
|
||
![entity-relationship-diagram](../../media/Pasted%20image%2020231201183143.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Stashfin Security / IAM / APIs | ||
|
||
## Security | ||
|
||
### External + processes | ||
|
||
Followed - AAA - Authentication, Authorization and Audit with best practices | ||
|
||
- Cloudflare WAF + fixing all our own APIs that were getting blocked because of using unsafe practices like non-escaped inputs, unsanitized inputs, etc | ||
- Regular VAPT tests and fixing each vulnerability, scheduled the reports monthly, so in monthly hackathons, whole team can address multiple vulnerabilities in one sitting | ||
- Regularly and automated rotating all keys | ||
- Service mesh + Cloudflare for monitoring all APIs and status codes | ||
|
||
### Internal | ||
|
||
- Added private VPN and blocked all internet traffic coming from outside to internal applications, added openvpn for developers to access internal resources and jump server to access internal compute instances | ||
- Eventually moved to zero trust access, between applications too, where each application will have it’s own api keys to access another application. | ||
- Centralized internal IAM for internal users’ applications access using keycloak for devops resources and django-admin for other applications | ||
- Using groups for permissions and adding users to groups instead of individual permissions | ||
- Immutable audit logs for all changes | ||
- No access to production databases | ||
- Rate limits for ddos protection in open public client APIs | ||
|
||
### Immutable Logs for Audit | ||
|
||
![stashfin-immutable-audit-logs](../../media/Pasted%20image%2020231201175020.png) | ||
|
||
### Authorization | ||
|
||
![stashfin-authorization](../../media/Pasted%20image%2020231201175035.png) | ||
|
||
### Postman implementation and documentation of all APIs | ||
|
||
[Stashfin Partners API](https://documenter.getpostman.com/view/16927648/TzzGGtg9) | ||
|
||
![stashfin-screenshot](../../media/Pasted%20image%2020231201175731.png) | ||
|
||
![stashfin-screenshot](../../media/Pasted%20image%2020231201175751.png) | ||
|
||
#### API Testing | ||
|
||
![stashfin-screenshot](../../media/Pasted%20image%2020231201175759.png) | ||
|
||
## WebView Implementations | ||
|
||
### WebView inside Apps | ||
|
||
- Customer support | ||
- Payments | ||
- New products | ||
- referral | ||
- brand ambassador program | ||
- stashearn | ||
|
||
![stasfin-screenshot](../../media/Pasted%20image%2020231201180310.png) | ||
|
||
![stashfin-screenshot](../../media/Pasted%20image%2020231201180349.png) | ||
|
||
![stashfin-screenshot](../../media/Pasted%20image%2020231201180442.png) |
Oops, something went wrong.