分布式系统面试清单

  1. 操作系统系统(基本知识)
    • GFS(high avaialbe, scalable, data replication,
      erasure code)
    • ext4 (disk layout, io scheduler, performance tunning) 了解
    • btrfs(the last file sytem, which is now being developed. You take some knowledge about the disk layout, snapshot, data integrity. You can search btrfs wiki on google ) 了解
    • os kernel(page cache) 了解
      1. 进程(通信机制包括消息队列,共享内存, pipeline等,进程线程区别)
  2. 存储(基本知识、设计)
    • 内存 k-v(redis) (memory data structure, hash algorithm, distributed data partition alogrithm, data avaiable)
    • 分布式数据库(Hbase) (CAP, BASE, multi-version concurrent control, WAL, LSM tree)
    • MySQL (InnoDB log, InnoDB data structure)
  3. 网络
    • TCP/IP 三次握手(three-way handshake, four-way finalization, TCP reset packet, TCP timeout, TCP RTT, TCP MSL, TCP state machine)
    • zero copy (sendfile system call, the os kernel data copy path)
    • RPC (epoll, select, poll mechanic, interrupt mechaic)
    • 数据分发
    • 路由
    • 数据迁移 (load balance, multi-tier data storage including memory, ssd, disk)
  4. 计算
    • mapreduce(Hadoop, Spark) (RDD, Shuffle, distributed computing framework task scheduler, which includes FAIR, FIFO, CFQ)
    • 流计算(Flink, HERO, Spark Streaming) (difference between real-time computing and batch compputing)
    • 图计算 (GraphX, Dremel) (BSP, SSP model)
  5. 高并发
    • 生产者、消费者模式
    • 无锁数据结构(linkedlist, map) (wikipedia word item, Consistency)
    • CAS (volatile cost overhead)
    • 可重入 (reentrant wikepeida’s word item)
    • kafka
    • raftor模式