跳转至

Welcome

docker storage

docker 有3种方式使用系统存储:

  • bind: 宿主机目录映射方式,启动容器命令添加参数-v dir1:dir2。将宿主机的目录dir1挂载到容器的dir2目录,实际执行的mount -o bind类似命令
  • volume: 由docker管理的存储,先创建后使用,且可以在多个容器中共享,是官方推荐的方式
  • tmpfs: 使用宿主机的内存作为存储,这种使用内存文件系统来挂载到容器目录, 使用参数--mount type=tmpfs,destination=/dir, tmpfs是退出即释放的一种文件系统

docker使用的文件系统

docker使用的文件系统经过很多变化,而且在各发行版下可能不同,但目前主流的是overlay2,执行docker info 查看当前使用的是overlay2

sudo docker info | grep  Storage                                                                                                                                              
 Storage Driver: overlay2

除了overlay2,还有aufs(ubuntu),devicemapper(centos),btrfs和zfs。他们的实现都不同,都能支持分层和支持写时复制(Cow/copy-on-write),而他们实现的方式有区别,所以效率也有区别

  • 分层: 镜像都是分层的,在Dockerfile构建时,每次执行COPY/RUN时,都会增加一层
  • 写时复制: 在容器或者DockerFile执行修改操作时,包括权限修改,会将lower layer 的文件复制到container层再修改

而容器就是在镜像顶层压栈了一个可写层,而且是临时的,当容器销毁时,这层的文件也会删除

overlay的优势

  1. page caching, 可以在多个不同实例之间共享
  2. 不同层之间,相同文件使用硬连接, 节省inode 和 大小

写时复制 copy-up 会导致第一次写时造成延迟,特别是大文件,拷贝起来费时。 但第二次就不会延时, 而且overlay2 有caching, 相比其它文件系统,更减少延时

overlay的问题

  1. 实现不够完全, 例如没有实现uname
  2. 先只读打开一个文件 open(read), 再读写打开相同文件open(write), 两个fd 会对应2个不同文件, 第一个对应的lower的文件,第二个造成写时复制,对应容器里的文件。
  3. 规避方法是先执行touch 操作。 现实的例子是 yum 需要安装yum-plugin-ovl。 但这个只有7.2才支持, 之前的话就需要先touch /var/lib/rpm/*

最佳实践

  1. 使用ssd
  2. 对于写操作比较多的场景,例如数据库,应使用映射文件(bind)或者volume。这样跳过了overlay的复杂操作,直接使用主机的文件系统

overlay的增删改

当运行docker容器时查看挂载

overlay on /var/lib/docker/overlay2/04ea1faa8074e5862f40eecdba968bd9b7f222cb30e5bf6a0b9a9c48be0940f2/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/B74PWZCBMRCWXFH5UL2ZXB5WEU:/var/lib/docker/overlay2/l/WNHICVPVSDNUGSCZW435TPSMOK,upperdir=/var/lib/docker/overlay2/04ea1faa8074e5862f40eecdba968bd9b7f222cb30e5bf6a0b9a9c48be0940f2/diff,workdir=/var/lib/docker/overlay2/04ea1faa8074e5862f40eecdba968bd9b7f222cb30e5bf6a0b9a9c48be0940f2/work)
docker 将镜像的文件挂载为只读, 将容器层挂载为可读可写。 文件系统可以分为2部分 upper(容器层) + lower (镜像层)

  • 当在容器里执行写时, 如果文件不存在, 会依次遍历lower。如果都不存在就会在upper层创建文件
  • 读也相同
  • 删除时会创建一个without 来隐藏, 这是为什么即使删除容器里的文件, 镜像还是会增大。
  • 删除目录情况也差不多

特殊情况

在保存容器后(docker commit),会多一层,里面包含了修改的文件,以及删除后生成的without文件,然后生成镜像

但对于以下特殊目录文件不会提交, 因为这些文件是运行时docker 要根据用户配置进行修改的。

  1. /etc/hostname
  2. /etc/hosts
  3. /etc/resov.conf

例如docker 的link选项,会在容器的hosts 文件里定义对应的容器名->容器ip

手动mount overlayfs的例子

  1. 原本目录,文件都分散在不同目录ABC
    .
    ├── A
    │   ├── aa
    │   └── a.txt
    ├── B
    │   ├── a.txt
    │   └── b.txt
    ├── C
    │   └── c.txt
    └── worker
        └── work [error opening dir]
    
  2. overlay 挂载到/tmp/test目录 sudo mount -t overlay overlay -o lowerdir=A:B,upperdir=C,workdir=worker /tmp/test/
  3. 查看test目录
    /tmp/test/
    ├── aa
    ├── a.txt
    ├── b.txt
    └── c.txt
    
    mount  | grep 'overlay'
    overlay on /tmp/test type overlay (rw,relatime,lowerdir=A:B,upperdir=C,workdir=worker)
    

参考

https://docs.docker.com/storage/storagedriver/

起因

遇到过一个面试题, 以下程序打印几个'A'

#include <stdio.h>
#include <unistd.h>

int main() {
  printf("A");
  fork();
  printf("A");
}
答案是4个,而以下是3个
#include <stdio.h>
#include <unistd.h>

int main() {
  printf("A\n");
  fork();
  printf("A");
}
再把上面的编译后的可执行文件重定向到文件, 则有4个'A'
#./a.out > r.log
#cat r.log
原因就是缓冲

Linux 缓冲介绍

分为无缓冲、行缓冲和全缓冲

演示

发现有些程序可以输出到屏幕,重定向到文件后,执行ctl+c退出后却没有任何内容,例如以下

#include <stdio.h>
#include <unistd.h>

int main(void) {
  while (1) {
    printf("Hello World\n");
    sleep(1);
  }
}
这个程序是死循环所以需要手动结束,且一直打印helloword。但如果将输出重定向到文件,

#./a.out > r.log
当手动退出时,文件依旧是空的

shell 中文件操作接口

读行

while read line ;do
#or while read -r line ;do
echo $line
done < $1
cat $1 | while read line ;do
echo $line
done

使用for时,结果略有不同, for以空格为一行

for line in $(cat $1) ;do
echo $line
done

reference

https://bash.cyberciti.biz/guide/Reads_from_the_file_descriptor_(fd)

setup sftp service

curlfs 和 sshfs 客户端

在debian上,之前一直用curlfs工具,将远程目录mount到本地目录。系统升级之后发现这个包没有了, 原来因为ftp不安全,所以现在推荐sshfs。

reference

https://www.linuxtechi.com/configure-sftp-chroot-debian10/

python arguement types

# accept tupe and list
def calc(numbers):
    sum = 0
    for n in numbers:
        sum = sum + n
    print(sum)

calc([1,2])
calc((1,2,3))

# accept varidict argument
def calc2(*numbers):
    sum = 0
    for n in numbers:
        sum = sum + n
    return sum

calc2(0)
calc2(1,2,3,4)

numbers = [1,2,3]
# transform list to varidcit argument
print(calc(*numbers))


# varidict named arguments will be transformed to dict

# only accept named varidict argument
def person2(**kw):
    if 'hello' in kw:
        passs

    print(kw)

# accept compose argument in sequence: normal argument & named varidict argument
def person(name, age, **kw):
    print('name', name, 'age', age, 'other', kw)

person('mike',12)
person('mike',12, city='sh')
person('mike',12, city='sh', birth=1990)
person2(name='mike',age = 12, city='sh', birth=1990)

# compose normal type argument, varidict type argument, and varidict named type argument 
def f2(a, b, c=0, *d, **kw):
    print('a =', a, 'b =', b, 'c =', c, 'd =', d, 'kw =', kw)

cs=[3,4]
f2(1,2,*cs,hello='hello')


# use * to sepcify named type argument
def f3(a,b,c,*,d,e):
    print('a =', a, 'b =', b, 'c =', c, 'd =', d, 'e =', e)

# don't accept other named argument
f3(1,2,3,d=4,e=5)

# use tupe and dict can call all kind of function
# so *arg and **kw are always used to forward all argument to another function of any type of argument
args = (1,2,3)
kw = {'c':1,'d':2}
f2(*arg, **kw)

# at last, use base type for default value

bash使用技巧收集

Linux的命令行是非常强大的生产工具,现实中的问题好多可以不写复杂的编程代码而通过命令来解决, 而且不过几行代码而已

shell 内建命令

当执行which的时候,比如which echo 会输出'shell built-in command'而不是返回路径,就说明这个命令是内建命令
shell在执行内建命令时是直接执行,而非内建命令则fork一个子进程来执行

命令行的问题很多与缓冲buff、重定向pipe、delimiter分割符有关

技巧, 用命令代替编程

使用xargs创建多进程并发

xargs可以将输入多进程并发处理,配合find 非常厉害。 配合wget就是一个并发爬虫了

EXAMPLES
       find /tmp -name core -type f -print | xargs /bin/rm -f

       Find files named core in or below the directory /tmp and delete them.  Note that this will work incorrectly if there are any filenames containing newlines or spaces.

       find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

       Find files named core in or below the directory /tmp and delete them, processing filenames in such a way that file or directory names containing spaces or newlines  are  correctly
       handled.

       find /tmp -depth -name core -type f -delete

       Find  files  named  core in or below the directory /tmp and delete them, but more efficiently than in the previous example (because we avoid the need to use fork(2) and exec(2) to
       launch rm and we don't need the extra xargs process).

       cut -d: -f1 < /etc/passwd | sort | xargs echo

       Generates a compact listing of all the users on the system.

find -exec 用法

find . -exec grep chrome {} \;
find . -exec grep chrome {} +
1. \; 是转义; 2. {} 会替换成 find出的文件名。 2. ; 和 + 的区别是: 当;时,grep命令会在每个文件名上执行一次,grep执行多次, 但+时,所有的文件名会将所有的文件名一次进行grep,grep执行一次。

使用awk ORS 分割结果

ls 本来是行分隔的 awk将ls的换行符变为|分割

ls | awk '{ ORS="|"; print; }'
echo $(ls) 则把ls的换行符变为空格

使用declare 声明变量的类型和属性

declare可以指定变量的类型'-i'为整形,'-r'制度(等同shell的readonly),'-g'全局变量 shell默认是字符串,使用'-i'后,数学运算不需要let

#retry=0
declare -i retry=0
while [ $retry -lt 30 ] ; do
ps aux --cols=1024 | grep xxx
if [ $? -eq 0 ] ; then
        exit 0
fi
sleep 1
#let retry=$retry+1
retry=$retry+1
done

go 学习笔记

go为什么要指针和什么时候必须用指针

  1. 传递waitgroup时
  2. 获取命令行参数

全局变量和多变量赋值问题

因为很多函数需要处理error, 当存在全局变量时,可能会覆盖全局变量

var global_var int = 1
func foo(){
  // 此时覆盖了全局变量
  global_var, err := os.OpenFile("file")
}
应改为
var global_var int = 1
func foo(){
  var err error
  global_var, error = os.OpenFile("file")
}

数组和slice的区别

数组和slice使用上如同c++里的std::array和std::vector, 数组的长度固定,slice长度可以2倍扩容
但有传参上有大区别: golang中数组是值传递,相当于将整个数组复制一遍,而slice是引用传递,所以slice用得多,map同理

定义数组和slice:

// 数组
var a [3] int
arr := [5] int {1,2,3,4,5}
var array2 = [...]int {6,7,8}
q := [...] int {1,2,3} // 省略长度
q2 := [...] int {99, -1} // 前99项为0,第100个元素是-1

// slice
s1 := []int{1, 2, 3}
a := [10]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 0}  //a是数组
s2 := a[2:8]         
s3 := make([]int, 10, 20) 

var 和 := 区别, 以及哪些类型需要make

两者很多时候可以相互替换,但是在不同类型上,有区别

对于基本类型string, int, boolean, 数组,var声明即初始化

var str string // 自动初始化为空字符串
var inter int // 自动初始化为0
var bo bool // 自动初始化为false

// 可以直接使用
fmt.Printf("%s %d %t", str, inter, bo) 

而对于slice, map, chann类型而言, 使用var m map[int]string只是声明,需要再用make来获得内存和初始化

var m map[int] string // 此时不能使用m
m = make(map[int] string){1:"a",2:"b"}
fmt.Println(m[1])
而上面的步骤可以简化成
m := make(map[int] string){1:"a", 2:"b"}
或直接
m := map[int] string{1:"a", 2:"b"}

c++ 存储周期、链接和作用域

c++中变量和函数的三个重要属性

存储周期类型: 有关变量的创建和销毁 链接类型: 有关变量函数的内存位置 作用域: 有关变量函数的可见范围

本文讨论的标识符,包括变量和函数

存储说明符

存储说明符控制变量何时分配和释放,有以下几种

  • automatic
  • thread_local
  • static
  • register
  • mutable
  • extern

说明 - automatic: 最常见的局部变量,且没有声明为static或者thread_local,位于栈上, 随着代码块的执行和结束而自动分配和销毁 - static: 静态变量, 在程序启动和结束时创建和销毁,但初始化是在第一次执行初始化代码时执行 - thread: 在线程开始和结束时分配和销毁 - dynamic: 最常见的堆上的变量, 需要执行new和delete,

auto 在c++11中不是声明存储周期,而是类型推导符, 但这种存储周期类型的依然存在(局部变量)

初始化的时机

  • automatic: 必须手动初始化,换句话说局部变量必须初始化,否则值为不确定
  • static: 在执行时初始化,且初始化一次,特殊情况下在执行前初始化
  • thread: 因为thread_local变量自带static性质,所以认为其同于static
  • dynamic: 在new时初始化

Linkage

标识符(变量&函数)用一块内存里的值或者函数体来表示的, 而linkage决定其他相同的标识符是否指向同一块内存。c/c++有3种linkage, no-linkage, internal linkage和external linkage

  • no linkage 局部变量没有linkage, 所以两个a是独立的,后面的a会覆盖前面的a,不相干。此时linkage与可见域(scope)类似
  • internal linkage 表示只能在文件内部访问(file scope),换句话就是不会暴露给链接器, 用修饰符static声明internal linkage,所以允许在不同文件声明两个名称&类型相同的internal linkage 标识符,他们指向不同的内存单元。
  • external linkage 表示可以在程序所有地方访问,包括外部文件(global scope),所以是真“全局”(scope&linkage), 所有标识符指向独一份内存。

修饰符

  • 全局const变量和全局constexpr变量默认具备internal linkage, 再加上static没有影响
  • 全局非const变量默认是external linkage, 故再加上extern没有影响。在其他文件使用extern声明这个变量,就能使用指向同一内存的变量
  • 函数默认external linkage,故再加上extern没有影响。 在其他文件使用extern声明这个函数(可省),就能使用指向同一内存的函数
  • 使用extern修饰全局const变量和constexpr变量可以使起具备external linkage

可见staticextern即表示存储周期,又表示linkage, static相对简单,extern则比较复杂,如以下情况

int g_x = 1; // 定义有初始化的全局变量(可加可不加extern)
int g_x; // 定义没有初始化的全局变量(不可加extern),可选初始化
extern int g_x; // 前置声明一个全局变量,不可初始化

extern const int g_y { 1 }; // 定义全局常量,const必须初始化
extern const int g_y; // 前置声明全局常量,不可初始化

所以若是定义未初始化的全局变量,不能加extern,不然就成了前置声明了。

constexpr 特殊情况

虽然通过给constexpr添加extern修饰符来让其具备external属性,但不能在其他文件前置声明。因为constexpr是在编译期替换的,编译器(compile)的可见域限定在文件内,所以编译期无法知道constexpr的值,所以在编译期无法获取到其内存单元的值, 也就无法在其他文件进行声明,只能定义。

file scope和global scope

局部变量的scope、no-linkage以及duration相同,从{开始到}结束。 理论上global scope涵盖了file scope。而linkage来规定其是否能在其他文件里使用。

local class

local class 不允许有static data member

参考

https://en.cppreference.com/w/cpp/language/storage_duration

Linux 调优

系统原厂商是不喜欢讨论系统调优的,一方面说起来没完没了,二来比较复杂,而且私以为调优即说明系统默认不够好?

而且SUSE的原厂规定:

原理机制的介绍及系统调优并不在我们的技术支持范畴

这里是一点相关介绍

buffer/cache 的作用和区别

buffer是用于存放将要输出到disk(块设备)的数据,而cache是存放从disk上读出的数据。二者都是为提高IO性能而设计的。
- buffer:缓冲将数据缓冲下来,解决速度慢和快的交接问题;速度快的需要通过缓冲区将数据一点一点传给速度慢的区域。
例如:从内存中将数据往硬盘中写入,并不是直接写入,而是缓冲到一定大小之后刷入硬盘中。
A buffer is something that has yet to be "written" to disk.

  • cache:缓存实现数据的重复使用,速度慢的设备需要通过缓存将经常要用到的数据缓存起来,缓存下来的数据可以提供高速的传输速度给速度快的设备。
    例如:将硬盘中的数据读取出来放在内存的缓存区中,这样以后再次访问同一个资源,速度会快很多。
    A cache is something that has been "read" from the disk and stored for later use.

总之buff和cache都是内存和硬盘之间的过渡,前者是写入磁盘方向,而后者是写入内存方向

回收cache

drop_caches回收一下。
#sync;sync;sync
#echo 3 > /proc/sys/vm/drop_caches    
free增加300M

swap 介绍

Swap意思是交换分区,是硬盘中的一个分区。内核将内存Page移出内存到swap分区(swap out)

swap通过 vm.swappiness 这个内核参数控制,默认值是60。cat /proc/sys/vm/swappiness 可以查看当前值
这个参数控制内核使用swap的优先级。该参数从0到100。

设置该参数为0,表示只要有可能就尽力避免交换进程移出物理内存;
设置该参数为100,这告诉内核疯狂的将swapout物理内存移到swap分区。 注意:设置该参数为0,并不代表禁用swap分区,只是告诉内核,能少用到swap分区就尽量少用到,设置vm.swappiness=100的话,则表示尽量使用swap分区。

这里面涉及到当然还涉swappiness及到复杂的算法。如果以为所有物理内在用完之后,再使用swap, 实事并不是这样。以前曾经遇到过,物理内存只剩下10M了,但是依然没有使用Swap交换空间,另外一台服务器,物理内存还剩下15G,居然用了一点点Swap交换空间。 其实少量使用Swap交换空间是不会影响性能,只有当内存资源出现瓶颈或者内存泄露,进程异常时导致频繁、大量使用交换分区才会导致严重性能问题。

问题:何时使用swap

这个问题如上面说的,比较难说,理论上是当物理内存不够用的时候,又需要读入内存时,会将一些长时间不用的程序的内存Page 交换出去。
但是很多时候会发现,内核即使在内存充足的情况下也是使用到swap

问题: 那些东西被swap了?

可以看下面的测试

回收swap

swapoff 之后执行sudo sysctl vm.swappiness=0 临时让内核不用swapout

并把swap的数据加载内存,并重启swap 
#swapoff -a
#swapon -a
即把swap分区清空, 自测效果如下,内核版本5.10.0-8-amd64

               total        used        free      shared  buff/cache   available
Mem:        12162380     4911564     5605744      459364     1645072     6466572
Swap:        1000444      763040      237404

重启swap后

               total        used        free      shared  buff/cache   available
Mem:        12162380     5605800     4843176      524984     1713404     5707112
Swap:        1000444           0     1000444

可见,停用swap后,swap的used大部分到了mem的used,小部分到了Mem的shared

调优的一些有效工具

perf + flame火焰图: 查看运行耗时,可以查看函数调用耗时,如果是自己的程序,可以知道哪些函数需要优化 vmstat 查看磁盘io情况,使用vmstat -t 3命令,如果b状态的数字一直很大,那么说明磁盘阻塞严重,可能是磁盘坏了,可能是程序设计不合理

还有top,iperf等等

grpc callback api

C++ callback-based asynchronous API

  • Author(s): vjpai, sheenaqotj, yang-g, zhouyihaiding
  • Approver: markdroth
  • Status: Proposed
  • Implemented in: https://github.com/grpc/grpc/projects/12
  • Last updated: March 22, 2021
  • Discussion at https://groups.google.com/g/grpc-io/c/rXLdWWiosWg

Abstract

Provide an asynchronous gRPC API for C++ in which the completion of RPC actions in the library will result in callbacks to user code,

Background

Since its initial release, gRPC has provided two C++ APIs:

  • Synchronous API
  • All RPC actions (such as unary calls, streaming reads, streaming writes, etc.) block for completion
  • Library provides a thread-pool so that each incoming server RPC executes its method handler in its own thread
  • Completion-queue-based (aka CQ-based) asynchronous API
  • Application associates each RPC action that it initiates with a tag
  • The library performs each RPC action
  • The library posts the tag of a completed action onto a completion queue
  • The application must poll the completion queue to determine which asynchronously-initiated actions have completed
  • The application must provide and manage its own threads
  • Server RPCs don't have any library-invoked method handler; instead the application is responsible for executing the actions for an RPC once it is notified of an incoming RPC via the completion queue

The goal of the synchronous version is to be easy to program. However, this comes at the cost of high thread-switching overhead and high thread storage for systems with many concurrent RPCs. On the other hand, the asynchronous API allows the application full control over its threading and thus can scale further. The biggest problem with the asynchronous API is that it is just difficult to use. Server RPCs must be explicitly requested, RPC polling must be explicitly controlled by the application, lifetime management is complicated, etc. These have proved sufficiently difficult that the full features of the asynchronous API are basically never used by applications. Even if one can use the async API correctly, it also presents challenges in deciding how many completion queues to use and how many threads to use for polling them, as one can either optimize for reducing thread hops, avoiding stranding, reducing CQ contention, or improving locality. These goals are often in conflict and require substantial tuning.

  • The C++ callback API has an implementation that is built on top of a new callback completion queue in core. There is also another implementation, discussed below.
  • The API structure has substantial similarities to the gRPC-Node and gRPC-Java APIs.

Proposal

The callback API is designed to have the performance and thread scalability of an asynchronous API without the burdensome programming model of the completion-queue-based model. In particular, the following are fundamental guiding principles of the API:

  • Library directly calls user-specified code at the completion of RPC actions. This user code is run from the library's own threads, so it is very important that it must not wait for completion of any blocking operations (e.g., condition variable waits, invoking synchronous RPCs, blocking file I/O).
  • No explicit polling required for notification of completion of RPC actions.
  • In practice, these requirements mean that there must be a library-controlled poller for monitoring such actions. This is discussed in more detail in the Implementation section below.
  • As in the synchronous API, server RPCs have an application-defined method handler function as part of their service definition. The library invokes this method handler when a new server RPC starts.
  • Like the synchronous API and unlike the completion-queue-based asynchronous API, there is no need for the application to "request" new server RPCs. Server RPC context structures will be allocated and have their resources allocated as and when RPCs arrive at the server.

Reactor model

The most general form of the callback API is built around a reactor model. Each type of RPC has a reactor base class provided by the library. These types are:

  • ClientUnaryReactor and ServerUnaryReactor for unary RPCs
  • ClientBidiReactor and ServerBidiReactor for bidi-streaming RPCs
  • ClientReadReactor and ServerWriteReactor for server-streaming RPCs
  • ClientWriteReactor and ServerReadReactor for client-streaming RPCs

Client RPC invocations from a stub provide a reactor pointer as one of their arguments, and the method handler of a server RPC must return a reactor pointer.

These base classes provide three types of methods:

  1. Operation-initiation methods: start an asynchronous activity in the RPC. These are methods provided by the class and are not virtual. These are invoked by the application logic. All of these have a void return type. The ReadMessageType below is the request type for a server RPC and the response type for a client RPC; the WriteMessageType is the response type for a server RPC or the request type for a client RPC.
  2. void StartCall(): (Client only) Initiates the operations of a call from the client, including sending any client-side initial metadata associated with the RPC. Must be called exactly once. No reads or writes will actually be started until this is called (i.e., any previous calls to StartRead, StartWrite, or StartWritesDone will be queued until StartCall is invoked). This operation is not needed at the server side since streaming operations at the server are released from backlog automatically by the library as soon as the application returns a reactor from the method handler, and because there is a separate method just for sending initial metadata.
  3. void StartSendInitialMetadata(): (Server only) Sends server-side initial metadata. To be used in cases where initial metadata should be sent without sending a message. Optional; if not called, initial metadata will be sent when StartWrite or Finish is called. May not be invoked more than once or after StartWrite or Finish has been called. This does not exist at the client because sending initial metadata is part of StartCall.
  4. void StartRead(ReadMessageType*): Starts a read of a message into the object pointed to by the argument. OnReadDone will be invoked when the read is complete. Only one read may be outstanding at any given time for an RPC (though a read and a write can be concurrent with each other). If this operation is invoked by a client before calling StartCall or by a server before returning from the method handler, it will be queued until one of those events happens and will not actually trigger any activity or reactions until it is thereby released from the queue.
  5. void StartWrite(const WriteMessageType*): Starts a write of the object pointed to by the argument. OnWriteDone will be invoked when the write is complete. Only one write may be outstanding at any given time for an RPC (though a read and a write can be concurrent with each other). As with StartRead, if this operation is invoked by a client before calling StartCall or by a server before returning from the method handler, it will be queued until one of those events happens and will not actually trigger any activity or reactions until it is thereby released from the queue.
  6. void StartWritesDone(): (Client only) For client RPCs to indicate that there are no more writes coming in this stream. OnWritesDoneDone will be invoked when this operation is complete. This causes future read operations on the server RPC to indicate that there is no more data available. Highly recommended but technically optional; may not be called more than once per call. As with StartRead and StartWrite, if this operation is invoked by a client before calling StartCall or by a server before returning from the method handler, it will be queued until one of those events happens and will not actually trigger any activity or reactions until it is thereby released from the queue.
  7. void Finish(Status): (Server only) Sends completion status to the client, asynchronously. Must be called exactly once for all server RPCs, even for those that have already been cancelled. No further operation-initiation methods may be invoked after Finish.
  8. Operation-completion reaction methods: notification of completion of asynchronous RPC activity. These are all virtual methods that default to an empty function (i.e., {}) but may be overridden by the application's reactor definition. These are invoked by the library. All of these have a void return type. Most take a bool ok argument to indicate whether the operation completed "normally," as explained below.
  9. void OnReadInitialMetadataDone(bool ok): (Client only) Invoked by the library to notify that the server has sent an initial metadata response to a client RPC. If ok is true, then the RPC received initial metadata normally. If it is false, there is no initial metadata either because the call has failed or because the call received a trailers-only response (which means that there was no actual message and that any information normally sent in initial metadata has been dispatched instead to trailing metadata, which is allowed in the gRPC HTTP/2 transport protocol). This reaction is automatically invoked by the library for RPCs of all varieties; it is uncommonly used as an application-defined reaction however.
  10. void OnReadDone(bool ok): Invoked by the library in response to a StartRead operation. The ok argument indicates whether a message was read as expected. A false ok could mean a failed RPC (e.g., cancellation) or a case where no data is possible because the other side has already ended its writes (e.g., seen at the server-side after the client has called StartWritesDone).
  11. void OnWriteDone(bool ok): Invoked by the library in response to a StartWrite operation. The ok argument that indicates whether the write was successfully sent; a false value indicates an RPC failure.
  12. void OnWritesDoneDone(bool ok): (Client only) Invoked by the library in response to a StartWritesDone operation. The bool ok argument that indicates whether the writes-done operation was successfully completed; a false value indicates an RPC failure.
  13. void OnCancel(): (Server only) Invoked by the library if an RPC is canceled before it has a chance to successfully send status to the client side. The reaction may be used for any cleanup associated with cancellation or to guide the behavior of other parts of the system (e.g., by setting a flag in the service logic associated with this RPC to stop further processing since the RPC won't be able to send outbound data anyway). Note that servers must call Finish even for RPCs that have already been canceled as this is required to cleanup all their library state and move them to a state that allows for calling OnDone.
  14. void OnDone(const Status&) at the client, void OnDone() at the server: Invoked by the library when all outstanding and required RPC operations are completed for a given RPC. For the client-side, it additionally provides the status of the RPC (either as sent by the server with its Finish call or as provided by the library to indicate a failure), in which case the signature is void OnDone(const Status&). The server version has no argument, and thus has a signature of void OnDone(). Should be used for any application-level RPC-specific cleanup.
  15. Thread safety: the above calls may take place concurrently, except that OnDone will always take place after all other reactions. No further RPC operations are permitted to be issued after OnDone is invoked.
  16. IMPORTANT USAGE NOTE : code in any reaction must not block for an arbitrary amount of time since reactions are executed on a finite-sized, library-controlled threadpool. If any long-term blocking operations (like sleeps, file I/O, synchronous RPCs, or waiting on a condition variable) must be invoked as part of the application logic, then it is important to push that outside the reaction so that the reaction can complete in a timely fashion. One way of doing this is to push that code to a separate application-controlled thread.
  17. RPC completion-prevention methods. These are methods provided by the class and are not virtual. They are only present at the client-side because the completion of a server RPC is clearly requested when the application invokes Finish. These methods are invoked by the application logic. All of these have a void return type.
  18. void AddHold(): (Client only) This prevents the RPC from being considered complete (ready for OnDone) until each AddHold on an RPC's reactor is matched to a corresponding RemoveHold. An application uses this operation before it performs any extra-reaction flows, which refers to streaming operations initiated from outside a reaction method. Note that an RPC cannot complete before StartCall, so holds are not needed for any extra-reaction flows that take place before StartCall. As long as there are any holds present on an RPC, though, it may not have OnDone called on it, even if it has already received server status and has no other operations outstanding. May be called 0 or more times on any client RPC.
  19. void AddMultipleHolds(int holds): (Client only) Shorthand for holds invocations of AddHold .
  20. void RemoveHold(): (Client only) Removes a hold reference on this client RPC. Must be called exactly as many times as AddHold was called on the RPC, and may not be called more times than AddHold has been called so far for any RPC. Once all holds have been removed, the server has provided status, and all outstanding or required operations have completed for an RPC, the library will invoke OnDone for that RPC.

Examples are provided in the PR to de-experimentalize the callback API.

Unary RPC shortcuts

As a shortcut, client-side unary RPCs may bypass the reactor model by directly providing a std::function for the library to call at completion rather than a reactor object pointer. This is passed as the final argument to the stub call, just as the reactor would be in the more general case. This is semantically equivalent to a reactor in which the OnDone function simply invokes the specified function (but can be implemented in a slightly faster way since such an RPC will definitely not wait separately for initial metadata from the server) and all other reactions are left empty. In practice, this is the common and recommended model for client-side unary RPCs, unless they have a specific need to wait for initial metadata before getting their full response message. As in the reactor model, the function provided as a callback may not include operations that block for an arbitrary amount of time.

Server-side unary RPCs have the option of returning a library-provided default reactor when their method handler is invoked. This is provided by calling DefaultReactor on the CallbackServerContext. This default reactor provides a Finish method, but does not provide a user callback for OnCancel and OnDone. In practice, this is the common and recommended model for most server-side unary RPCs unless they specifically need to react to an OnCancel callback or do cleanup work after the RPC fully completes.

ServerContext extensions

ServerContext is now made a derived class of ServerContextBase. There is a new derived class of ServerContextBase called CallbackServerContext which provides a few additional functions:

  • ServerUnaryReactor* DefaultReactor() may be used by a method handler to return a default reactor from a unary RPC.
  • RpcAllocatorState* GetRpcAllocatorState: see advanced topics section

Additionally, the AsyncNotifyWhenDone function is not present in the CallbackServerContext.

All method handler functions for the callback API take a CallbackServerContext* as their first argument. ServerContext (used for the sync and CQ-based async APIs) and CallbackServerContext (used for the callback API) actually use the same underlying structure and thus their object pointers are meaningfully convertible to each other via a static_cast to ServerContextBase*. We recommend that any helper functions that need to work across API variants should use a ServerContextBase pointer or reference as their argument rather than a ServerContext or CallbackServerContext pointer or reference. For example, ClientContext::FromServerContext now uses a ServerContextBase* as its argument; this is not a breaking API change since the argument is now a parent class of the previous argument's class.

Advanced topics

Application-managed server memory allocation

Callback services must allocate an object for the CallbackServerContext and for the request and response objects of a unary call. Applications can supply a per-method custom memory allocator for gRPC server to use to allocate and deallocate the request and response messages, as well as a per-server custom memory allocator for context objects. These can be used for purposes like early or delayed release, freelist-based allocation, or arena-based allocation. For each unary RPC method, there is a generated method in the server called SetMessageAllocatorFor_*MethodName* . For each server, there is a method called SetContextAllocator. Each of these has numerous classes involved, and the best examples for how to use these features lives in the gRPC tests directory.

Generic (non-code-generated) services

RegisterCallbackGenericService is a new method of ServerBuilder to allow for processing of generic (unparsed) RPCs. This is similar to the pre-existing RegisterAsyncGenericService but uses the callback API and reactors rather than the CQ-based async API. It is expected to be used primarily for generic gRPC proxies where the exact serialization format or list of supported methods is unknown.

Per-method specification

Just as with async services, callback services may also be specified on a method-by-method basis (using the syntax WithCallbackMethod_*MethodName*), with any unlisted methods being treated as sync RPCs. The shorthand CallbackService declares every method as being processed by the callback API. For example:

  • Foo::Service -- purely synchronous service
  • Foo::CallbackService -- purely callback service
  • Foo::WithCallbackMethod_Bar<Service> -- synchronous service except for callback method Bar
  • Foo::WithCallbackMethod_Bar<WithCallbackMethod_Baz<Service>> -- synchronous service except for callback methods Bar and Baz

Rationale

Besides the content described in the background section, the rationale also includes early and consistent user demand for this feature as well as the fact that many users were simply spinning up a callback model on top of gRPC's completion queue-based asynchronous model.

Implementation

There is more than one mechanism available for implementing the background polling required by the C++ callback API. One has been implemented on top of the C++ completion queue API. In this approach, the callback API uses a number of library-owned threads to call Next on an async CQ that is owned by the internal implementation. Currently, the thread count is automatically selected by the library with no user input and is set to half the system's core count, but no less than 2 and no more than 16. This selection is subject to change in the future based on our team's ongoing performance analysis and tuning efforts. Despite being built on the CQ-based async API, the developer using the callback API does not need to consider any of the CQ details (e.g., shutdown, polling, or even the existence of a CQ).

It is the gRPC team's intention that that implementation is only a temporary solution. A new structure called an EventEngine is being developed to provide the background threads needed for polling, and this sytem is also intended to provide a direct API for application use. This event engine would also allow the direct use of the core callback API that is currently only used by the Python async implementation. If this solution is adopted, there will be a new gRFC for it. This new implementation will not change the callback API at all but rather will only affect its performance. The C++ code for the callback API already has if branches in place to support the use of a poller that directly supplies the background threads, so the callback API will naturally layer on top of the EventEngine without further development effort.

Open issues (if applicable)

N/A. The gRPC C++ callback API has been used internally at Google for two years now, and the code and API have evolved substantially during that period.