Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
deepaksood619 committed Nov 20, 2023
1 parent cecc04e commit 65cc103
Show file tree
Hide file tree
Showing 49 changed files with 78 additions and 81 deletions.
20 changes: 10 additions & 10 deletions docs/ai/data-science/big-data/data-preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
1. Aggregation
2. Attribute Transformation
3. Dimensionality Reduction
- Feature creation
- Feature subset selection
- Feature creation
- Feature subset selection
4. Discretization and Binarization
5. Sampling

Expand All @@ -23,7 +23,7 @@

### Discretization

![image](media/Data-Preprocessing-image1.jpg)
![image](../../../media/Data-Preprocessing-image1.jpg)

### Attribute Transformation

Expand All @@ -48,39 +48,39 @@

p and q are the attribute values for two data objects

![image](media/Data-Preprocessing-image2.jpg)
![image](../../../media/Data-Preprocessing-image2.jpg)

### Euclidean Distance

![image](media/Data-Preprocessing-image3.jpg)
![image](../../../media/Data-Preprocessing-image3.jpg)

- Where n is the number of dimensions (attributes) and p~k~ and q~k~ are, respectively, the k^th^ attributes (components) or data objects p and q.
- Standardization is necessary, if scales differ

### Mahalanobis Distance

![image](media/Data-Preprocessing-image4.jpg)
![image](../../../media/Data-Preprocessing-image4.jpg)

- For red points, the Euclidean distance is 14.7, Mahalanobis distance is 6

### Cosine Similarity

![image](media/Data-Preprocessing-image5.jpg)
![image](../../../media/Data-Preprocessing-image5.jpg)

### Similarity Between Binary Vectors

![image](media/Data-Preprocessing-image6.jpg)
![image](../../../media/Data-Preprocessing-image6.jpg)

## Correlation

- Correlation measures the linear relationship between objects
- To compute correlation, we standardize data objects, p and q, and then take their dot product

![image](media/Data-Preprocessing-image7.jpg)
![image](../../../media/Data-Preprocessing-image7.jpg)

### Visually Evaluating Correlation

![image](media/Data-Preprocessing-image8.jpg)
![image](../../../media/Data-Preprocessing-image8.jpg)

- Scatter plots showing the similarity from -1 to 1

Expand Down
2 changes: 1 addition & 1 deletion docs/ai/data-science/big-data/data-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@
- Examples:
- Same person with multiple email addresses
- Data cleaning
- Process of deaing with duplicate data issues
- Process of deaing with duplicate data issues
Original file line number Diff line number Diff line change
Expand Up @@ -199,4 +199,5 @@ Types of Clustering (3:39)
Dendrogram (5:21)

Heatmaps (4:34)

<https://365datascience.teachable.com/courses/enrolled/362812>
1 change: 1 addition & 0 deletions docs/computer-science/courses/self-driving-nanodegree.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Tools
- Project:[Traffic Sign Classifier](https://classroom.udacity.com/nanodegrees/nd013/parts/edf28735-efc1-4b99-8fbb-ba9c432239c8/modules/6b6c37bc-13a5-47c7-88ed-eb1fce9789a0/lessons/7ee8d0d4-561e-4101-8615-66e0ab8ea8c8/project)
- Project:[Behavioral Cloning](https://classroom.udacity.com/nanodegrees/nd013/parts/edf28735-efc1-4b99-8fbb-ba9c432239c8/modules/6b6c37bc-13a5-47c7-88ed-eb1fce9789a0/lessons/3fc8dd70-23b3-4f49-86eb-a8707f71f8dd/project)
- Project:[Extended Kalman Filters](https://classroom.udacity.com/nanodegrees/nd013/parts/edf28735-efc1-4b99-8fbb-ba9c432239c8/modules/49d8fda9-69c7-4f10-aa18-dc3a2d790cbe/lessons/3feb3671-6252-4c25-adf0-e963af4d9d4a/project)

<https://www.freecodecamp.org/news/perception-for-self-driving-cars-deep-learning-course>

[Comic book panel segmentation • Max Halford](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,20 @@
## Core Features

1. Tweeting

2. Timeline
- User (Own tweets in profile)
- Home (All tweets from people you follow)

3. Following

4. Full Text Search

5. HashTags

6. Push Notifications

7. Text Notifications

8. How to incorporate Advertisments

## Database

1. Tweets database

2. Users database

Problem with this structure is that to get a tweet corresponding to user, if would take a lot of time because there would be a big select query. (Every time we open twitter home statement)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Branch prediction is not the same as [branch target prediction](https://en.wikip
- [1.14 Prediction of function returns](https://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_function_returns)
- [1.15 Overriding branch prediction](https://en.wikipedia.org/wiki/Branch_predictor#Overriding_branch_prediction)
- [1.16 Neural branch prediction](https://en.wikipedia.org/wiki/Branch_predictor#Neural_branch_prediction)

<https://en.wikipedia.org/wiki/Branch_predictor>

## Application Binary Interface (ABI)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -258,12 +258,10 @@ Atomic instruction that compares contents of a memory location M to a given valu

<https://en.wikipedia.org/wiki/Compare-and-swap>

## See also

Python > Advanced > Concurrency

## References

[Concurrency](python/advanced/concurrency.md)

[https://schneems.com/2017/10/23/wtf-is-a-thread/#](https://schneems.com/2017/10/23/wtf-is-a-thread/)

Dijkstra's Guarded Commands - <https://en.wikipedia.org/wiki/Guarded_Command_Language>
Expand Down
8 changes: 2 additions & 6 deletions docs/computer-science/programming-concepts/type-systems.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,16 +93,12 @@ As opposed to strong typed languages, weak typed languages are those in which va

Thus, Python is dynamic typed and strong typed; Java is static typed and strong typed; PHP is dynamic typed and weak typed; C is static typed and weak typed (owing to its casting ability).

## See Also

- Programming Styles > Duck typing

## Others

- Algebraic Data Types (ADT)

## References

[Functional Programming: Type Systems](https://www.youtube.com/watch?v=hy1wjkcIBCU)
Programming Styles > Duck typing

![image](../../media/Type-Systems-image2.jpg)
[Functional Programming: Type Systems](https://www.youtube.com/watch?v=hy1wjkcIBCU)
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ In a public key encryption system, any person can encrypt a message using the re
![image](../../../media/Cryptography-Intro_Public-key-cryptography-image2.jpg)

![image](../../../media/Cryptography-Intro_Public-key-cryptography-image3.jpg)

<https://medium.com/sitewards/the-magic-of-tls-x509-and-mutual-authentication-explained-b2162dec4401>
![image](../../../media/Cryptography-Intro_Public-key-cryptography-image4.jpg)
1 change: 1 addition & 0 deletions docs/computer-science/security/vault.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ Vault's [database secrets engine](https://www.vaultproject.io/docs/secrets/datab
This reduces the manual tasks performed by the database administrator and makes the database access more efficient and secure.
![image](../../media/Vault-image7.jpg)
<https://learn.hashicorp.com/tutorials/vault/database-root-rotation>
<https://learn.hashicorp.com/tutorials/vault/database-secrets>
Expand Down
1 change: 1 addition & 0 deletions docs/computer-science/system-design/api-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ Key features include:

<https://github.com/datawire/ambassador>
![image](../../media/API-Gateway-image1.jpg)

<https://microservices.io/patterns/apigateway.html>

Rate Limiting Service
Expand Down
12 changes: 9 additions & 3 deletions docs/computer-science/system-design/event-driven-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,16 @@ Replaying all the events from the logs can give any state of the system at any t
- Example -

1. Version control - git

2. Accounting ledgers

- Can be used for
- Audit
- Debugging
- Historic State
- Alternative State
- Memory Image

4. CQRS - Command and Query Responsibility Segregation
2. CQRS - Command and Query Responsibility Segregation

CQRS is a fancy name for an architecture that uses different data models to represent read and write operations.
At its heart is the notion that you can use a different model to update information than the model you use to read information.
Expand All @@ -58,7 +58,7 @@ At its heart is the notion that you can use a different model to update informat

Event-driven architecture (EDA) means constructing your system as a series of commands and/or events. A user submits an online form to make a purchase: that's a command. The items in stock are reserved: that's an event. A confirmation is sent to the user: that's an event. The concept is very simple. Everything in our system is either a command or an event. Commands lead to events and events may lead to new commands and so on.

## Event Sourcing is a style of application design where state changes are logged as a time-ordered sequence of records
**Event Sourcing is a style of application design where state changes are logged as a time-ordered sequence of records**

## Publisher subscriber rule

Expand All @@ -71,6 +71,7 @@ Event-driven architecture (EDA) means constructing your system as a series of co

- A **stream** provides immutable data. It supports only inserting (appending) new events, whereas existing events cannot be changed. Streams are persistent, durable, and fault tolerant. Events in a stream can be keyed, and you can have many events for one key, like "all of Bob's payments." If you squint a bit, you could consider a stream to be like a table in a relational database (RDBMS) that has no unique key constraint and that is append only.
- A **table** provides mutable data. New events - rows - can be inserted, and existing rows can be updated and deleted. Here, an event's key aka row key identifies which row is being mutated. Like streams, tables are persistent, durable, and fault tolerant. Today, a table behaves much like an RDBMS materialized view because it is being changed automatically as soon as any of its input streams or tables change, rather than letting you directly run insert, update, or delete operations against it.

| | **Stream** | **Table** |
|-------------------------------------------|------------|-------------|
| First event with key bob arrives | Insert | Insert |
Expand All @@ -88,6 +89,7 @@ Not withstanding their differences, we can observe that there is a close relatio
In fact, a table is fully defined by its underlying change stream. If you have ever worked with a relational database such as Oracle or MySQL, these change streams exist there, too! Here, however, they are a hidden implementation detail - albeit an absolutely critical one - and have names like [redo log](https://docs.oracle.com/cd/B28359_01/server.111/b28310/onlineredo001.htm#ADMIN11302) or [binary log](https://dev.mysql.com/doc/internals/en/binary-log-overview.html). In event streaming, the redo log is much more than an implementation detail. It's a first-class entity: a stream. We can turn streams into tables and tables into streams, which is one reason why we say that event streaming and Kafka are [turning the database inside out](https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/).

![image](../../media/Event-driven-architecture-image3.jpg)

<https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming>

![image](../../media/Event-driven-architecture-image4.jpg)
Expand Down Expand Up @@ -179,8 +181,11 @@ Task queues manage background work that must be executed outside the usual HTTP
## Why are task queues necessary?

Tasks are handled asynchronously either because they are not initiated by an HTTP request or because they are long-running jobs that would dramatically reduce the performance of an HTTP response.

For example, a web application could poll the GitHub API every 10 minutes to collect the names of the top 100 starred repositories. A task queue would handle invoking code to call the GitHub API, process the results and store them in a persistent database for later use.

Another example is when a database query would take too long during the HTTP request-response cycle. The query could be performed in the background on a fixed interval with the results stored in the database. When an HTTP request comes in that needs those results a query would simply fetch the precalculated result instead of re-executing the longer query. This precalculation scenario is a form of [caching](https://www.fullstackpython.com/caching.html) enabled by task queues.

Other types of jobs for task queues include

- spreading out large numbers of independent database inserts over time instead of inserting everything at once
Expand All @@ -193,6 +198,7 @@ Other types of jobs for task queues include

Message oriented middleware (MOM) refers to the software infrastructure supporting sending and receiving messages between distributed systems. AMQP and MQTT are the two most relevant protocols in this context. They are extensively used for exchanging messages since they provide an abstraction of the different participating system entities, alleviating their coordination and simplifying the communication programming details.
The basic idea of MOM is that communication takes place by adding messages to distributed queues, and by getting messages from those queues. Based on the model of Message Oriented Middleware, many protocols have been developed, e.g. DDS, STOMP, XMPP. The two most widely used proposals are: the Advanced Message Queuing Protocol (AMQP) and the Message Queuing Telemetry Transport (MQTT).

See also:

- AMQP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Functional decomposition is a term that engineers use to describea set of steps
- UI patterns:
- [Server-side page fragment composition](https://microservices.io/patterns/ui/server-side-page-fragment-composition.html)
- [Client-side UI composition](https://microservices.io/patterns/ui/client-side-ui-composition.html)

<https://microservices.io/patterns/microservices.html>

## Catalog of patterns
Expand Down
29 changes: 13 additions & 16 deletions docs/computer-science/testing/load-performance-testing-qa-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,9 @@ Goad takes full advantage of the power of Amazon Lambdas for distributed load te
abis a tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server. It is designed to give you an impression of how your current Apache installation performs.
## See Also
Locust
## HTTP Load Testing Tools
- [wrk](https://github.com/wg/wrk)
### - [wrk](https://github.com/wg/wrk)
```bash
wrk --duration 20s --threads 10 --connections 200 [URL]
Expand All @@ -102,7 +98,7 @@ wrk -c 5 -t 5 -d 99999 -H "Connection: Close" <http://application-cpu>
wrk -c 5 -t 5 -d 99999 -H "Connection: Close" <https://facebook.com>
```
- **Apache Bench - [Apache HTTP Server Benchmarking Tool](https://httpd.apache.org/docs/2.4/programs/ab.html) (for percentiles)**
### - **Apache Bench - [Apache HTTP Server Benchmarking Tool](https://httpd.apache.org/docs/2.4/programs/ab.html) (for percentiles)**
```bash
apt install apache2
Expand All @@ -129,26 +125,27 @@ ab -c 50 -n 5000 -s 90 -p data.json -T application/json -rk <https://staff.lende
```

- [Siege](https://github.com/JoeDog/siege) (for constant load)
### - [Siege](https://github.com/JoeDog/siege) (for constant load)

```bash
apt-get install -y siege
siege -c2 -t2m [URL]
```

- hey / boom
### - hey / boom

```bash
hey <https://dev.example.com>
<https://github.com/rakyll/hey>
```

- <https://k6.io>
### - <https://k6.io>

Open source load testing tool and SaaS for engineering teams

- [**https://fortio.org/**](https://fortio.org/)
### - [**https://fortio.org/**](https://fortio.org/)

Fortio load testing library, command line tool, advanced echo server and web UI in go (golang). Allows to specify a set query-per-second load and record latency histograms and other useful stats.
Fortio runs at a specified query per second (qps) and records an histogram of execution time and calculates percentiles (e.g. p99 ie the response time such as 99% of the requests take less than that number (in seconds, SI unit)). It can run for a set duration, for a fixed number of calls, or until interrupted (at a constant target QPS, or max speed/load per connection/thread).
Expand All @@ -166,21 +163,19 @@ Fortio runs at a specified query per second (qps) and records an histogram of ex

<https://github.com/blueperf>

## References

<https://www.testingexcellence.com/top-10-open-source-performance-testing-tools>

## Locust

Locust is an easy-to-use, distributed, user load testing tool. It is intended for load-testing websites (or other systems) and figuring out how many concurrent users a system can handle.

Locust is a scalable load testing framework written in Python

Locust is completely event-based, and therefore it's possible to support thousands of concurrent users on a single machine. In contrast to many other event-based apps it doesn't use callbacks. Instead it uses light-weight processes, through [gevent](http://www.gevent.org/). Each locust swarming your site is actually running inside its own process (or greenlet, to be correct). This allows you to write very expressive scenarios in Python without complicating your code with callbacks.

## Running Locust Distributed
### Running Locust Distributed

You start one instance of Locust in master mode using the--masterflag. This is the instance that will be running Locust's web interface where you start the test and see live statistics. The master node doesn't simulate any users itself. Instead you have to start one or -most likely - multiple slave Locust nodes using the--slaveflag, together with the--master-host(to specify the IP/hostname of the master node).

## Commands
### Commands

```bash
locust -f tasks.py --host localhost:5000
Expand Down Expand Up @@ -271,3 +266,5 @@ Subscribe
## Others

<https://aws.amazon.com/about-aws/whats-new/2021/05/introducing-distributed-load-testing-v1-3>

<https://www.testingexcellence.com/top-10-open-source-performance-testing-tools>
1 change: 1 addition & 0 deletions docs/computer-science/testing/terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ In order to keep a system secure, it is advisable to conduct a pentest on a regu
- [Web Application Testing](https://www.tutorialspoint.com/software_testing_dictionary/web_application_testing.htm)
- [White box Testing](https://www.tutorialspoint.com/software_testing_dictionary/white_box_testing.htm)
- [Workflow Testing](https://www.tutorialspoint.com/software_testing_dictionary/workflow_testing.htm)

<https://dev.to/maxwell_dev/the-testing-introduction-i-wish-i-had-2dn>

<https://dev.to/conw_y/towards-zero-bugs-1bop>
Expand Down
2 changes: 1 addition & 1 deletion docs/data-structures/graph/digraphs-directed-graphs.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ We use adjacency-list representation for representing a weighted graph, where ea

## See also

- Topological Sort
[Topological Sort](algorithms/graphtheory/topological-sort-algorithm.md)
1 change: 1 addition & 0 deletions docs/data-structures/hashtable/count-min-sketch.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ Count-min sketch is a probabilitstic data strucure that is used to count the fre
Count-Min sketch is a probabilistic sub-linear space streaming algorithm. It is somewhat similar to bloom filter. The main difference is that bloom filter represents a set as a bitmap, while Count-Min sketch represents a multi-set which keeps a frequency distribution summary.

![image](../../media/Count-min-Sketch-image1.jpg)

<https://youtu.be/ibxXO-b14j4>
1 change: 1 addition & 0 deletions docs/data-structures/hashtable/hashing-techniques.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ A common optimization (so common in fact, that it is almost to be considered a p
Example:A coalescing hash table array with *M* = 10 and *N* = 3

![image](../../media/Hashing-Techniques-image5.jpg)

<https://programming.guide/coalesced-hashing.html>

### Robin hood hashing
Expand Down
6 changes: 4 additions & 2 deletions docs/databases/nosql-databases/aws-dynamodb/working.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@ Replicated write capacity unit (rWCU)- One**read capacity unit**represents one s
- One**write capacity unit**represents one write per second for items up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.

![image](../../../media/AWS-DynamoDB_Working-image1.jpg)- Secondary Indexes
- Local secondary indexes
- Global secondary indexes (asynchronous)

- Local secondary indexes
- Global secondary indexes (asynchronous)

<https://aws.amazon.com/dynamodb/pricing/provisioned>

## NoSQL Data Modeling
Expand Down
1 change: 1 addition & 0 deletions docs/databases/nosql-databases/cassandra/working.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
- Consistency < All performs read repair in background (read_repair_chance -default 10% of reads)

![image](../../../media/Cassandra_Working-image2.jpg)**Compaction**

- Data updates accumulate over time and SSTables and logs need to be compacted
- The process of compaction merges SSTables, i.e., by merging updates for a key
- Run periodically and locally at each server- TimeWindowCompactionStrategy
Expand Down
1 change: 1 addition & 0 deletions docs/databases/nosql-databases/druid/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Druid's process types are:
- Returning supervisor and task status to callers- [**Router**](http://druid.io/docs/latest/development/router.html) processes areoptionalprocesses that provide a unified API gateway in front of Druid Brokers, Overlords, and Coordinators. They are optional since you can also simply contact the Druid Brokers, Overlords, and Coordinators directly.

![image](../../../media/Druid_Architecture-image1.jpg)

<https://docs.imply.io/cloud/design>
Druid processes can be deployed individually (one per physical server, virtual server, or container) or can be colocated on shared servers. One common colocation plan is a three-type plan:

Expand Down
Loading

0 comments on commit 65cc103

Please sign in to comment.