Skip to content
Elaineflying edited this page Nov 24, 2020 · 1 revision

SQL DBs

----What are windowing functions? https://mode.com/sql-tutorial/sql-window-functions/ https://zhuanlan.zhihu.com/p/60226935 首先,可以把“窗口”这个词想象成一个集合,一个窗口就是一个集合。 over (partition by a order by b) from T 是指:把表T按照a列进行分组,然后,分别让每一个集合的记录按照b列进行排序。 于是,我们再使用一个排序函数,我们就可以得到一个新列,这一列的值就是每一条记录在它所在集合中的排序编号。 partiton by是可选的。如果不使用partition by,那么就是将整张表作为一个集合,最后使用排序函数得到的就是每一条记录根据b列的排序编号。

----What is a stored procedure? https://www.w3schools.com/sql/sql_stored_procedures.asp SQL Server stored procedures are used to group one or more Transact-SQL statements into logical units. The stored procedure are stored as named objects in the SQL Server Database Server.

When you call a stored procedure for the first time, SQL Server creates an execution plan and stores it in the cache. In the subsequent executions of the stored procedure, SQL Server reuses the plan so that the stored procedure can execute very fast with reliable performance.

----Why would you use them?

----What are atomic attributes? https://logicalread.com/sql-server-entity-relationship-model-mc03/#.X70J6dMzZfU https://blog.csdn.net/kinglyjn/article/details/54379882 Atomicity(原子性)

-----Explain ACID props of a database ACID(Atomicity 原子性、Consistency 一致性、Isolation 隔离性、Durability 持久性) https://blog.csdn.net/shfqbluestone/article/details/52007011

-----How to optimize queries? https://www.sisense.com/blog/8-ways-fine-tune-sql-queries-production-databases/ https://zhuanlan.zhihu.com/p/102809323 https://dbaplus.cn/news-155-1531-1.html

-----What are the different types of JOIN (CROSS, INNER, OUTER)? file:///Users/elaine/Desktop/Screen%20Shot%202020-11-24%20at%209.49.19%20PM.png file:///Users/elaine/Desktop/Screen%20Shot%202020-11-24%20at%209.59.18%20PM.png https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/

-----What is the difference between Clustered Index and Non-Clustered Index - with examples? https://www.geeksforgeeks.org/difference-between-clustered-and-non-clustered-index/ https://www.jianshu.com/p/5681ebd5b0ef 聚集索引相当于我们书本上前面的目录的一样,它可以方便快速的找到你想找的内容,而非聚集索引就相当于书最后几页的解释,它是对书中某个语句或者是生词的解释,就像我们上学时候的地理说一样,书后面都有各种地理名称的英文解释;

The Cloud

----What is serverless? https://www.infoq.cn/article/SkLy3mGHNiKGVMVGXhT0 https://serverless-stack.com/chapters/zh/what-is-serverless.html https://serverless-stack.com/chapters/what-is-serverless.html 无服务器架构是指应用程序使用第三方 Function 和服务,但不需要管理服务器。无服务器架构主要包含了两个方面:

FaaS(Function as a Service,Function即服务):包含服务器端业务逻辑的无状态Function。这些Function运行在独立的容器里,基于事件驱动,并由第三方厂商托管,如AWS Lambda或者Azure Functions。

BaaS(Backend as a Service,后端即服务):使用第三方服务(如Firebase、Auth0)来达成目的。使用BaaS的应用程序通常是富客户端应用程序,如SPA或移动App。客户端负责处理大部分的业务逻辑,其他部分则依赖外部服务,如认证、数据库、用户管理,等等。

----What is the difference between IaaS, PaaS and SaaS? https://www.bmc.com/blogs/saas-vs-paas-vs-iaas-whats-the-difference-and-how-to-choose/ file:///Users/elaine/Desktop/Screen%20Shot%202020-11-24%20at%2010.10.20%20PM.png

----How do you move from the ingest layer to the Cosumption layer? (In Serverless)

----What is edge computing? https://www.theverge.com/circuitbreaker/2018/5/7/17327584/edge-computing-cloud-google-microsoft-apple-amazon

----What is the difference between cloud and edge and on-premise? https://phoenixnap.com/blog/edge-computing-vs-cloud-computing https://www.cleo.com/blog/knowledge-base-on-premise-vs-cloud

Linux

---What is crontab? https://kb.iu.edu/d/afiz https://linuxtools-rst.readthedocs.io/zh_CN/latest/tool/crontab.html https://zhuanlan.zhihu.com/p/58719487 crontab在Linux主要用于周期定时任务管理

Big Data

What are the 4 V's?

Which one is most important?

Kafka What is a topic?

How to ensure FIFO?

How do you know if all messages in a topic have been fully consumed?

What are brokers?

What are consumergroups?

What is a producer?

Coding What is the difference between an object and a class?

Explain immutability

What are AWS Lambda functions and why would you use them?

Difference between library, framework and package

How to reverse a linked list

Difference between args and kwargs

Difference between OOP and functional programming

NoSQL DBs What is a key-value (rowstore) store?

What is a columnstore?

Diff between Row and col.store

What is a document store?

Difference between Redshift and Snowflake

Hadoop What file formats can you use in Hadoop?

What is the difference between a namenode and a datanode?

What is HDFS?

What is the purpose of YARN?

Lambda Architecture What is streaming and batching?

What is the upside of streaming vs batching?

What is the difference between lambda and kappa architecture?

Can you sync the batch and streaming layer and if yes how?

Python Difference between list tuples and dictionary

Data Warehouse & Data Lake

-----What is a data lake? https://www.talend.com/resources/data-lake-vs-data-warehouse/

-----What is a data warehouse? https://www.jianshu.com/p/a3a62402edaa

-----Are there data lake warehouses?

Two data lakes within single warehouse?

------what is a data mart? https://www.modb.pro/db/24834 https://www.jianshu.com/p/329aea918956 file:///Users/elaine/Desktop/Screen%20Shot%202020-11-24%20at%2010.34.57%20PM.png

What is a slow changing dimension (types)?

------What is a surrogate key and why use them? https://blog.csdn.net/PacificPeng/article/details/38372599 https://www.jianshu.com/p/652bb3908db5 代理键

APIs (REST) What does REST mean?

What is idempotency?

What are common REST API frameworks (Jersey and Spring)?

Apache Spark What is an RDD?

What is a dataframe?

What is a dataset?

How is a dataset typesafe?

What is Parquet?

What is Avro?

Difference between Parquet and Avro

Tumbling Windows vs. Sliding Windows

Difference between batch and stream processing

What are microbatches?

MapReduce What is a use case of mapreduce?

Write a pseudo code for wordcount

What is a combiner?

Docker & Kubernetes What is a container?

Difference between Docker Container and a Virtual PC

What is the easiest way to learn kubernetes fast?

Data Pipelines What is an example of a serverless pipeline?

What is the difference between at most once vs at least once vs exactly once?

What systems provide transactions?

What is a ETL pipeline?

Airflow What is a DAG (in context of airflow/luigi)?

What are hooks/is a hook?

What are operators?

How to branch?

DataVisualization What is a BI tool? Security/Privacy What is Kerberos?

What is a firewall?

What is GDPR?

What is anonymization?

Distributed Systems How clusters reach consensus (the answer was using consensus protocols like Paxos or Raft). Good I didnt have to explain paxos

What is the cap theorem / explain it (What factors should be considered when choosing a DB?)

How to choose right storage for different data consumers? It's always a tricky question

Apache Flink What is Flink used for?

Flink vs Spark?

GitHub What are branches?

What are commits?

What's a pull request?

Dev/Ops What is continuous integration?

What is continuous deployment?

Difference CI/CD

Development / Agile What is Scrum?

What is OKR?

What is Jira and what is it used for?

Clone this wiki locally