翻译ProtobufOverview

牧歌2024/7/11...大约 15 分钟

Protobuf Overview

文章地址：https://protobuf.dev/overview/

Overview

Protocol Buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.

Protocol Buffers 是多语言，多平台可扩展的序列化结构数据的方法。

It’s like JSON, except it’s smaller and faster, and it generates native language bindings. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

它像JSON，除此之外它更小、更快，它生成本地化语言绑定。您只需定义一次数据的结构化方式，然后就可以使用特殊生成的源代码轻松地将结构化数据写入和读取到各种数据流，并使用各种语言。

Protocol buffers are a combination of the definition language (created in .proto files), the code that the proto compiler generates to interface with data, language-specific runtime libraries, the serialization format for data that is written to a file (or sent across a network connection), and the serialized data.

Protocol buffers 是定义语言的组合（创建在 .proto 文件），这个代码是 proto 编译器生成带数据的接口，特定的语言运行库，写到文件（或通过网络发送）的数据的序列化格式，以及序列化的数据

## What Problems do Protocol Buffers Solve?

什么问题可以用 Protocol Buffers 解决？

Protocol buffers provide a serialization format for packets of typed, structured data that are up to a few megabytes in size. The format is suitable for both ephemeral network traffic and long-term data storage. Protocol buffers can be extended with new information without invalidating existing data or requiring code to be updated.

Protocol buffers 提供几兆字节大小的结构数据包的序列化模版。模版适用于短暂的网络传输和长期的数据存储。Protocol buffers 可以使用新信息进行扩展，而不会使现有数据无效或要求更新代码。

Protocol buffers are the most commonly-used data format at Google. They are used extensively in inter-server communications as well as for archival storage of data on disk. Protocol buffer messages and services are described by engineer-authored .proto files. The following shows an example message:

在谷歌 Protocol buffers 是最京城使用的数据格式。他们广泛的使用在跨服务通信还有档案存储数据在磁盘。Protocol buffer 消息和服务的描述是通过工程师编写的 .proto 文件。如下展示一个消息例子：

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;
}

The proto compiler is invoked at build time on .proto files to generate code in various programming languages (covered in Cross-language Compatibility later in this topic) to manipulate the corresponding protocol buffer. Each generated class contains simple accessors for each field and methods to serialize and parse the whole structure to and from raw bytes. The following shows you an example that uses those generated methods:

proto编译器在构建时对.proto文件调用，以生成各种编程语言的代码来操作相应的 protocol buffer。每个生成的类都包含用于每个字段的简单访问器和用于将整个结构序列化和解析为原始字节的方法。下面是一个使用这些生成的方法的例子:

Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .build();
output = new FileOutputStream(args[0]);
john.writeTo(output);

Because protocol buffers are used extensively across all manner of services at Google and data within them may persist for some time, maintaining backwards compatibility is crucial. Protocol buffers allow for the seamless support of changes, including the addition of new fields and the deletion of existing fields, to any protocol buffer without breaking existing services. For more on this topic, see Updating Proto Definitions Without Updating Code, later in this topic.

因为 Protocol buffers 在谷歌被广泛的使用贯穿任何形式的服务，数据可能会被保存一段时间，持续向后的兼容很重要。Protocol buffers 支持无缝的更改添加一个新的字段和删除一个存在的字段，给任意 protocol buffer 而不会破坏现有服务。更多的看Updating Proto Definitions Without Updating Code，在文章的后面

What are the Benefits of Using Protocol Buffers?

什么是使用 Protocol buffers 的好处？

Protocol buffers are ideal for any situation in which you need to serialize structured, record-like, typed data in a language-neutral, platform-neutral, extensible manner. They are most often used for defining communications protocols (together with gRPC) and for data storage.

Protocol buffers 在你需要序列化结构与语言无关、与平台无关、可扩展的方式输入数据的典范。它经常被用在定义通讯协议（例如个RPC）和数据存储。

Some of the advantages of using protocol buffers include:

使用 protocol buffers 一些优势如下

Compact data storage紧凑数据存储
Fast parsing快速解析
Availability in many programming languages可以多种项目语言
Optimized functionality through automatically-generated classes通过自动生成类优化功能

Cross-language Compatibility

The same messages can be read by code written in any supported programming language. You can have a Java program on one platform capture data from one software system, serialize it based on a .proto definition, and then extract specific values from that serialized data in a separate Python application running on another platform.

用任何受支持的编程语言编写的代码都可以读取相同的消息。您可以让一个平台上的Java程序从一个软件系统捕获数据，根据.proto定义对其进行序列化，然后在另一个平台上运行的单独Python应用程序中从序列化的数据中提取特定的值。

The following languages are supported directly in the protocol buffers compiler, protoc:

以下语言直接在协议缓冲区编译器中得到支持

The following languages are supported by Google, but the projects’ source code resides in GitHub repositories. The protoc compiler uses plugins for these languages:

Google支持以下语言，但项目的源代码驻留在GitHub存储库中。协议编译器使用这些语言的插件:

Cross-project Support

跨项目依赖

You can use protocol buffers across projects by defining message types in .proto files that reside outside of a specific project’s code base. If you’re defining message types or enums that you anticipate will be widely used outside of your immediate team, you can put them in their own file with no dependencies.

你能使用protocol buffer通过定义 message 在.proto 文件跨项目驻留在特定项目的代码库之外。如果您期望在您的直接团队之外广泛使用，你定义 message 类型或枚举放在自己的文件中，而不需要依赖

A couple of examples of proto definitions widely-used within Google are timestamp.proto and status.proto.

在Google中广泛使用的两个原型定义示例是时间戳原型和状态原型

Updating Proto Definitions Without Updating Code

更新原型定义而不更新代码

It’s standard for software products to be backward compatible, but it is less common for them to be forward compatible. As long as you follow some simple practices when updating .proto definitions, old code will read new messages without issues, ignoring any newly added fields. To the old code, fields that were deleted will have their default value, and deleted repeated fields will be empty. For information on what “repeated” fields are, see Protocol Buffers Definition Syntax later in this topic.
向后兼容是软件产品的标准，但向前兼容并不常见。只要您在更新 .proto 定义时遵循一些简单的做法，旧代码将毫无问题地读取新消息，并忽略任何新添加的字段。对于旧代码，删除的字段将具有默认值，删除的重复字段将为空。有关“重复”字段的信息，请参阅本主题后面的协议缓冲区定义语法。

New code will also transparently read old messages. New fields will not be present in old messages; in these cases protocol buffers provide a reasonable default value.
新代码也将透明地读取旧消息。新字段不会出现在旧消息中；在这些情况下，协议缓冲区提供合理的默认值。

When are Protocol Buffers not a Good Fit?

Protocol Buffers 何时不适合？

Protocol buffers do not fit all data. In particular:
协议缓冲区并不适合所有数据。尤其：

Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.
协议缓冲区倾向于假设整个消息可以一次加载到内存中，并且不大于对象图。对于超过几兆字节的数据，请考虑不同的解决方案；当处理较大的数据时，由于序列化副本，您实际上可能会得到多个数据副本，这可能会导致内存使用量出现惊人的峰值。
When protocol buffers are serialized, the same data can have many different binary serializations. You cannot compare two messages for equality without fully parsing them.
当协议缓冲区被序列化时，相同的数据可以有许多不同的二进制序列化。如果没有完全解析两条消息，则无法比较它们是否相等。
Messages are not compressed. While messages can be zipped or gzipped like any other file, special-purpose compression algorithms like the ones used by JPEG and PNG will produce much smaller files for data of the appropriate type.
消息不被压缩。虽然消息可以像任何其他文件一样进行压缩或 gzip 压缩，但 JPEG 和 PNG 使用的专用压缩算法将为适当类型的数据生成小得多的文件。
Protocol buffer messages are less than maximally efficient in both size and speed for many scientific and engineering uses that involve large, multi-dimensional arrays of floating point numbers. For these applications, FITS and similar formats have less overhead.
对于涉及大型多维浮点数数组的许多科学和工程用途来说，协议缓冲区消息在大小和速度上均未达到最大效率。对于这些应用程序，FITS 和类似格式的开销较小。
Protocol buffers are not well supported in non-object-oriented languages popular in scientific computing, such as Fortran and IDL.
在科学计算中流行的非面向对象语言（例如 Fortran 和 IDL）中，协议缓冲区没有得到很好的支持。
Protocol buffer messages don’t inherently self-describe their data, but they have a fully reflective schema that you can use to implement self-description. That is, you cannot fully interpret one without access to its corresponding .proto file.
Protocol buffer 消息本身并不自我描述其数据，但它们具有完全反射模式，您可以使用该模式来实现自我描述。也就是说，如果不访问其相应的 .proto 文件，则无法完全解释该文件。
Protocol buffers are not a formal standard of any organization. This makes them unsuitable for use in environments with legal or other requirements to build on top of standards.
协议缓冲区不是任何组织的正式标准。这使得它们不适合在有法律或其他要求建立在标准之上的环境中使用。

Who Uses Protocol Buffers?

Many projects use protocol buffers, including the following:
许多项目都使用协议缓冲区，包括以下内容：

How do Protocol Buffers Work?

协议缓冲区如何工作？

The following diagram shows how you use protocol buffers to work with your data.
下图显示了如何使用协议缓冲区来处理数据。

image.png|650

Figure 1. Protocol buffers workflow
图 1. Protocol buffers 工作流程

The code generated by protocol buffers provides utility methods to retrieve data from files and streams, extract individual values from the data, check if data exists, serialize data back to a file or stream, and other useful functions.
Protocol buffers 生成的代码提供实用方法来从文件和流中检索数据、从数据中提取单个值、检查数据是否存在、将数据序列化回文件或流以及其他有用的功能。

The following code samples show you an example of this flow in Java. As shown earlier, this is a .proto definition:
以下代码示例向您展示了 Java 中的此流程的示例。如前所述，这是一个 .proto 定义：

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;
}

Compiling this .proto file creates a Builder class that you can use to create new instances, as in the following Java code:
编译此 .proto 文件会创建一个 Builder 类，您可以使用该类创建新实例，如以下 Java 代码所示：

Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .build();
output = new FileOutputStream(args[0]);
john.writeTo(output);

You can then deserialize data using the methods protocol buffers creates in other languages, like C++:
然后，您可以使用协议缓冲区在其他语言（例如 C++）中创建的方法来反序列化数据：

Person john;
fstream input(argv[1], ios::in | ios::binary);
john.ParseFromIstream(&input);
int id = john.id();
std::string name = john.name();
std::string email = john.email();

Protocol Buffers Definition Syntax

Protocol Buffers 定义语法

When defining .proto files, you can specify that a field is either optional or repeated (proto2 and proto3) or leave it set to the default, implicit presence, in proto3. (The option to set a field to required is absent in proto3 and strongly discouraged in proto2. For more on this, see “Required is Forever” in Specifying Field Rules.)
定义 .proto 文件时，您可以指定字段为 optional 或 repeated （proto2 和 proto3），或将其设置为默认的隐式存在，在原型3. （将字段设置为 required 的选项在 proto3 中不存在，并且在 proto2 中强烈建议不要使用。有关详细信息，请参阅指定字段规则中的“永远为必填项”。）

After setting the optionality/repeatability of a field, you specify the data type. Protocol buffers support the usual primitive data types, such as integers, booleans, and floats. For the full list, see Scalar Value Types.
设置字段的可选性/重复性后，您可以指定数据类型。协议缓冲区支持常见的原始数据类型，例如整数、布尔值和浮点数。有关完整列表，请参阅标量值类型。

A field can also be of:
字段还可以是：

A message type, so that you can nest parts of the definition, such as for repeating sets of data.
message 类型，以便您可以嵌套部分定义，例如重复数据集。
An enum type, so you can specify a set of values to choose from.
enum 类型，因此您可以指定一组可供选择的值。
A oneof type, which you can use when a message has many optional fields and at most one field will be set at the same time.
oneof 类型，当消息有多个可选字段且最多同时设置一个字段时可以使用该类型。
A map type, to add key-value pairs to your definition.
map 类型，用于将键值对添加到您的定义中。

In proto2, messages can allow extensions to define fields outside of the message, itself. For example, the protobuf library’s internal message schema allows extensions for custom, usage-specific options.
在 proto2 中，消息可以允许扩展定义消息本身之外的字段。例如，protobuf 库的内部消息模式允许扩展自定义的、特定于使用的选项。

For more information about the options available, see the language guide for proto2 or proto3.
有关可用选项的更多信息，请参阅 proto2 或 proto3 的语言指南。

After setting optionality and field type, you choose a name for the field. There are some things to keep in mind when setting field names:
设置可选性和字段类型后，您可以为字段选择一个名称。设置字段名称时需要注意以下几点：

It can sometimes be difficult, or even impossible, to change field names after they’ve been used in production.
有时，在生产中使用字段名称后，更改字段名称可能很困难，甚至不可能。
Field names cannot contain dashes. For more on field name syntax, see Message and Field Names.
字段名称不能包含破折号。有关字段名称语法的更多信息，请参阅消息和字段名称。
Use pluralized names for repeated fields.
对重复字段使用复数名称。

After assigning a name to the field, you assign a field number. Field numbers cannot be repurposed or reused. If you delete a field, you should reserve its field number to prevent someone from accidentally reusing the number.
为字段指定名称后，您可以指定字段编号。字段编号不能改变用途或重复使用。如果删除字段，则应保留其字段编号，以防止有人意外地重复使用该编号。

Additional Data Type Support

附加数据类型支持

Protocol buffers support many scalar value types, including integers that use both variable-length encoding and fixed sizes. You can also create your own composite data types by defining messages that are, themselves, data types that you can assign to a field. In addition to the simple and composite value types, several common types are published.
Protocol buffers 支持许多标量值类型，包括使用可变长度编码和固定大小的整数。您还可以通过定义消息来创建自己的复合数据类型，这些消息本身就是可以分配给字段的数据类型。除了简单值类型和复合值类型之外，还发布了几种常见类型。

History 历史

To read about the history of the protocol buffers project, see History of Protocol Buffers.
要了解 Protocol Buffers 项目的历史，请参阅 Protocol Buffers 的历史。

Protocol Buffers Open Source Philosophy

Protocol Buffers 开源理念

Protocol buffers were open sourced in 2008 as a way to provide developers outside of Google with the same benefits that we derive from them internally. We support the open source community through regular updates to the language as we make those changes to support our internal requirements. While we accept select pull requests from external developers, we cannot always prioritize feature requests and bug fixes that don’t conform to Google’s specific needs.
Protocol Buffers 于 2008 年开源，旨在为 Google 外部的开发人员提供与我们内部的开发人员相同的优势。我们通过定期更新语言来支持开源社区，因为我们进行这些更改是为了支持我们的内部需求。虽然我们接受外部开发人员的精选拉取请求，但我们不能总是优先考虑不符合 Google 特定需求的功能请求和错误修复。