Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for: A good paper on how TLS-in-TLS detection works? #281

Open
mmmray opened this issue Aug 30, 2023 · 61 comments
Open

Looking for: A good paper on how TLS-in-TLS detection works? #281

mmmray opened this issue Aug 30, 2023 · 61 comments

Comments

@mmmray
Copy link

mmmray commented Aug 30, 2023

related to #280, it got me thinking about TLS-in-TLS: I wonder if ECH and/or GREASE in the inner layer would (temporarily) confuse those heuristics.

but then, i realized, i have no idea how TLS-in-TLS is detected today. is there an (english) summary of it?

I only see some vague allusions towards it at XTLS/Xray-core#1295 (google translate) which talks about packet sizes and timings being detectable via machine learning. But it's not very concrete about how this detection works (I think it's a good read regardless)

@nlifew
Copy link

nlifew commented Aug 31, 2023

#266

@nlifew
Copy link

nlifew commented Aug 31, 2023

@RPRX
Copy link

RPRX commented Aug 31, 2023

I wonder if ECH and/or GREASE in the inner layer would (temporarily) confuse those heuristics.

No.


几个月前有一篇研究 TLS in TLS 握手特征的论文(至今尚未发布),它的作者找到了我和 @yuhan6665 ,我提了一些修改建议

这篇论文也让我们意识到了实现 Vision Seed 的紧迫性,正好近期我们就在开发它,我还会给 Trojan-killer 加上原理说明等


A few months ago, the author of a paper on TLS in TLS handshake characterization (which has not yet been published) reached out to me and @yuhan6665, and I suggested some changes.

This paper also made us realize the urgency of implementing Vision Seed, which we've been working on for a while now, and I'll be adding a rationale to Trojan-killer, etc.

@nekohasekai
Copy link

nekohasekai commented Sep 1, 2023

I wonder if ECH and/or GREASE in the inner layer would (temporarily) confuse those heuristics.

No.

几个月前有一篇研究 TLS in TLS 握手特征的论文(至今尚未发布),它的作者找到了我和 @yuhan6665 ,我提了一些修改建议

这篇论文也让我们意识到了实现 Vision Seed 的紧迫性,正好近期我们就在开发它,我还会给 Trojan-killer 加上原理说明等

If I understand correctly, The paper actually states that protocols like XTLS Vision that simply add padding to the connection header don't work in their own homemade TLS in crypto recognition model, and there is currently no evidence that GFW applies such checks either. In other words, not only is the protocol complex and undefined, it is also meaningless for anti-censorship purposes.

Your Trojan-killer simply checks the first packet length and claims to have no false positives on your own device. For the doubter, you even said that you don't know how to open a .pcap file.

Using Chinese to promote your project in the anti-censorship community and exaggerating the dangers of TLS in crypto traffic characteristics actually does not help understanding GFW.

@RPRX
Copy link

RPRX commented Sep 1, 2023

If I understand correctly, The paper actually states that protocols like XTLS Vision that simply add padding to the connection header don't work in their own homemade TLS in crypto recognition model, and there is currently no evidence that GFW applies such checks either. In other words, not only is the protocol complex and undefined, it is also meaningless for anti-censorship purposes.

首先我认为,你在看了别人未发表的文章后,公开说这篇文章是什么内容是不合适的。你的描述仅是这篇文章内容的一部分且并不准确,但我若说太多就是揭示了更多的内容,包括他的测试方法、其它协议的数据以及数据之间的对比。我最多能透露,根据作者的说法,通用 TLS in TLS 握手检测模型对 Vision 的强 padding 效果非常差,只能“协议倒模”,而这就是 Seed 要解决的问题之一。

然后“either”这个词,我补充一下背景,这位朋友说 TLS in TLS 检测是炒作,还有一个人说是骗流量,所以我就写了 Trojan-killer,仅为了揭示确实存在的问题。你我都清楚 Vision 和 Trojan 的区别,也清楚 Trojan 的反馈经常是一天封一个端口,不知道这种大规模的用户实测能否作为你要的“evidence”?此外不要忘记,就是在这里出现的“内鬼”说的 GFW 已经部署了这种检测,或许他不一定可信,然而我证明了我们自己就能检测,还有大规模的用户实测作为佐证。

First of all I don't think it's appropriate for you to say publicly what this article is about after reading someone else's unpublished article. Your description is only part of the article and is not accurate, but to say too much would be to reveal much more, including his testing methodology, data from other protocols, and comparisons between the data. The most I can reveal is that, according to the author, the generic TLS in TLS handshake detection model is so ineffective against Vision's strong padding that it can only be "protocol inverted", which is one of the problems Seed is trying to solve.

And the word "either", let me add a little background, this person said that TLS in TLS detection is hype, and another person said that it's a traffic scam, so I wrote Trojan-killer just to reveal that there is a real problem. You and I both know the difference between Vision and Trojan, and we both know that Trojan's feedback is often one port a day, so I wonder if this large-scale user testing is the "evidence" you're looking for? Also, don't forget that the " insider " who appeared here said that GFW has already deployed this kind of detection, maybe he may not be credible, but I proved that we can detect it by ourselves, and there is a large-scale user testing as a proof.

Your Trojan-killer simply checks the first packet length and claims to have no false positives on your own device. For the doubter, you even said that you don't know how to open a .pcap file.

并不是仅首包,而是“客户端 CCS(包括)后、服务端发数据前,客户端所发的数据量总和”,以及“客户端再次发数据前,服务端所发的数据量总和”,所以其实我的检测限制的是时序条件。并不是“no false positives”,而是约千分之一(Tun 模式)。

至于你所说的“doubter”,他公然告诉我们“如果靠一个简单的size匹配能过滤全部tls流量,那么密码学的基础就不存在了”,这句话的水平我就请各位自行评判,我已经评价过了。那个文件当时我确实等了半天都没能打开,那段时间我的 WireShark 一直很卡,后来重装系统发现应该是之前我为了抓 Chrome 的包,开了导出密钥供 WireShark 解密,它是一直积累的,时间长了就很卡。

It's not just the first packet, it's "the sum of the amount of data sent by the client after the client's CCS (inclusive) and before the server sends the data" and "the sum of the amount of data sent by the server before the client sends the data again", so really what I'm restricting my detection to is the Timing condition. Not "no false positives", but about 1 in 1000 (Tun mode).

As for your "doubter", he blatantly told us that "if a simple size match can filter all tls traffic, then the foundation of cryptography doesn't exist", I'll leave it to you to judge the level of this statement, I've already done so. I've already done that. I did wait for half a day to open that file, and during that time my WireShark had been very stuck, and then I reinstalled my system and realized that I had opened the export key for WireShark to decrypt in order to capture Chrome packets, and it had been accumulating, and it was very stuck after a long period of time.

Using Chinese to promote your project in the anti-censorship community and exaggerating the dangers of TLS in crypto traffic characteristics actually does not help understanding GFW.

用中文怎么就有特殊效果了?而且我不是一贯用中文吗,这也能找角度黑?我用中文的最终原因很简单,因为我对英语的掌握远未达到随意改变语言特征的程度,毕竟不是母语,而用中文干这件事就方便很多。

How does using Chinese give you a special effect? And don't I always use Chinese, how can I find an angle to hack? The ultimate reason I use Chinese is simply because my mastery of English is far from being able to change the characteristics of the language at will, after all, it's not my mother tongue, and it's much easier to do this in Chinese.

@nekohasekai
Copy link

nekohasekai commented Sep 1, 2023

我最多能透露,根据作者的说法,通用 TLS in TLS 握手检测模型对 Vision 的强 padding 效果非常差

似乎并不是这样

That doesn't seem to be the case

也清楚 Trojan 的反馈经常是一天封一个端口

您应该清楚当时确定的原因是 Go TLS 指纹识别,且有报告有无条件 443 封锁出现。

无论如何,您的方法没有使用机器学习,也似乎无法应对不同的 Trojan 实现,也不会被 GFW 使用;且没有证据证明 GFW 应用了此类机器学习识别。考虑此类模型在误报率与性能问题,要想大规模部署都还需要很长时间。

You should be aware that the identified cause at the time was Go TLS fingerprinting, and there were reports of unconditional 443 blocks occurring.

Regardless, your approach does not use machine learning, does not appear to cope with different Trojan implementations, and is not used by GFW; and there is no evidence that GFW applies such machine learning recognition. Considering the false alarm rate and performance issues of such models, it will take a long time to deploy them on a large scale.

此外不要忘记,就是在这里出现的“内鬼”说的 GFW 已经部署了这种检测

我记得这位内鬼还曾爆料 GFW 使用 AES 加密的数据存在的特征进行封锁,这也是您要的密码学不存在了吗?

I remember that this insider also broke the news that GFW uses the characteristics of AES encrypted data to block it. Is this also the cryptography you want that does not exist?

这位朋友说 TLS in TLS 检测是炒作,还有一个人说是骗流量,所以我就写了 Trojan-killer,仅为了揭示确实存在的问题

您确实正在宣传一个没有证据的威胁,且您的 “识别” 并不能证明 GFW 确实能够或已经应用此类检查。

You are indeed promoting a threat without evidence, and your "proof" does not prove that GFW can or has applied such checks.

@mmmray
Copy link
Author

mmmray commented Sep 1, 2023

几个月前有一篇研究 TLS in TLS 握手特征的论文(至今尚未发布),它的作者找到了我和 @yuhan6665 ,我提了一些修改建议

Exactly, a paper demonstrating TLS-in-TLS is what I am looking for (and specifically, a link), or a paper that explains what testing has been done to identify how the GFW does it. It's hard for me to understand not just the urgency for Vision Seed (on which I will "just trust" you) but also how it works under the hood, because I don't understand what exact features it aims to eliminate. and yes, ideally it would be in english.

I will also say that it is currently very hard for an outsider to study anything related to the XTLS project. It may be intentional, but hence my original question, I am looking for a technical writeup of:

  1. what experiments did you do
  2. what did you observe in those experiments
  3. what do you conclude about the GFW
  4. what is the feature you want to eliminate
  5. how do you eliminate the feature

announcement of XTLS vision addresses 4 and 5, and is really vague about 3.

if this is intentional due to strategic reasons it is fine, there's a reason i asked the original question in a research forum though, and not in the XTLS bugtracker

@nekohasekai
Copy link

Exactly, a paper demonstrating TLS-in-TLS is what I am looking for (and specifically, a link), or a paper that explains what testing has been done to identify how the GFW does it.

I can tell you that this paper does not study the behavior of GFW, but constructs a machine learning model by itself. If it weren't for rprx's publicity, I'm afraid the community wouldn't know how dangerous it is.

With the exception of the XTLS vision padding, everything about XTLS is about performance optimization and is unreviewed, and older versions of it have identifiable issues.

@nekohasekai

This comment was marked as off-topic.

@mmmray
Copy link
Author

mmmray commented Sep 1, 2023

i think there is merit in proactively eliminating features of a protocol before others get to build detection around it. i did not intend to start a discussion around that or to discuss how the XTLS project is lead. I am here to understand how TLS-in-TLS detection works, and I suppose if the GFW does not do it, how it can be done. again, even after this long unrelated debate, I have not seen a decent writeup on it.

@nekohasekai

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@mmmray

This comment was marked as off-topic.

@RPRX
Copy link

RPRX commented Sep 1, 2023

我最多能透露,根据作者的说法,通用 TLS in TLS 握手检测模型对 Vision 的强 padding 效果非常差

似乎并不是这样

请你仔细看一下那个表格,检测其它协议(包括一些 mux)所用的是 通用模型,而检测 obfs 和 Vision 所用的是两个 针对性模型,即对 Vision 进行“协议倒模”,这就是 Seed 要解决的问题之一。我向作者说应当把通用模型检测 Vision 的数据放上来,作者对 @yuhan6665 和我说了上面这番话,这就是当时它没被放上来的原因。

If you look at the table carefully, detecting other protocols (including some muxes) uses generic models, while detecting obfs and Vision uses two specific models, i.e., "protocol inverting" for Vision, which was one of the problems Seed was trying to solve. I said to the author that the data from the generic model for detecting Vision should be put up, and the author said the above to @yuhan6665 and me, which is why it wasn't put up at the time.

也清楚 Trojan 的反馈经常是一天封一个端口

您应该清楚当时确定的原因是 Go TLS 指纹识别,且有报告有无条件 443 封锁出现。

无论如何,您的方法没有使用机器学习,也似乎无法应对不同的 Trojan 实现,也不会被 GFW 使用;且没有证据证明 GFW 应用了此类机器学习识别。考虑此类模型在误报率与性能问题,要想大规模部署都还需要很长时间。

如果你认为只有指纹识别,今年 uTLS 已经铺开,Trojan 和 Vision 都能用的情况下,仍然经常有人报告前者一天封一个端口。

Trojan-killer 仅为一个揭示 TLS in TLS 握手问题的启发性 PoC,为什么非要基于机器学习的方法?它写出来也不是为了非常完善以便被 GFW 使用。此外,我并没有 GFW 的方法的误报率与性能的内部数据,但我相信检测它对 GFW 的资源来说不是问题。

If you think it's only fingerprinting, uTLS has been rolled out this year with both Trojan and Vision working, and there are still frequent reports of the former blocking a port a day.

Trojan-killer is just an illuminating PoC that reveals the TLS in TLS handshake problem, so why does it have to be based on a machine learning approach? It's also not written to be perfected for use by GFW. Also, I don't have internal data on the false positives and performance of GFW's method, but I'm sure detecting it would not be a problem for GFW's resources.

此外不要忘记,就是在这里出现的“内鬼”说的 GFW 已经部署了这种检测

我记得这位内鬼还曾爆料 GFW 使用 AES 加密的数据存在的特征进行封锁,这也是您要的密码学不存在了吗?

并不是同一个人。对于 AES in AES 的说法,我的评价是“至于 AES in AES,我也觉得有点扯,但他说和硬件有关,不是我的专业。”

请注意,你的发言多次存在这样的事实性错误。

Not the same person. My comment on the AES in AES statement was "As far as AES in AES goes, I think it's a bit of a stretch too, but he said it had to do with hardware, not my specialty."

Please note that your statement is factually incorrect in this way on several occasions.

这位朋友说 TLS in TLS 检测是炒作,还有一个人说是骗流量,所以我就写了 Trojan-killer,仅为了揭示确实存在的问题

您确实正在宣传一个没有证据的威胁,且您的 “识别” 并不能证明 GFW 确实能够或已经应用此类检查。

我建议你自己去测试一下。既然我们都能用低成本检测出经典的 TLS in TLS 握手,为什么你仍觉得 GFW 没有能力实施检测?

关于 GFW 是否应用了此类检查,上面已经说过了。这已经是人们切实遇到的事情、实测出的区别。

I suggest you test it yourself. Why do you still think that GFW is not capable of applying inspections when we can all detect the classic TLS in TLS handshake at low cost?

The question of whether GFW applies such checks has already been addressed above. It's already a difference between what people actually encounter and what they actually test.

@RPRX

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@mmmray

This comment was marked as off-topic.

@mmmray mmmray closed this as completed Sep 1, 2023
@RPRX
Copy link

RPRX commented Sep 1, 2023

@mmmray 请不要误会,我的意思是他应当在这里发就事论事的“正面回应”,这只是在讨论技术问题。

Please don't get me wrong, I mean he should be posting "direct responses" here on the matter, this is just a technical discussion.

@nametoolong
Copy link

nametoolong commented Sep 1, 2023

I guess it is worth actually explaining TLS-in-TLS detection before the flamewar heats up as people tend to get involved in the war without a proper understanding of the underlying issue.

First, let's take a look at TLS. Modern TLS usage is almost provable secure with LHAE security. Which means, vulnerabilities keep cropping up, but for most people that's just fine. An attacker's power is greatly limited compared to plaintext protocols:

  • An attacker can get no information of application data itself.
  • An attacker can tamper none of handshake messages and application data without getting caught.
  • And, in an appropriate setting:
    • an attacker can not decrypt the stream after its ephemeral state is gone, without a quantum computer.
    • an attacker can not determine the exact length of an application message.

These properties may seem strong enough for a generic secure transport, but not necessarily sufficient. It is common to want to hide some information in the handshake messages, hence ESNI and ECH. It is common to want to bind a TLS stream to another stream, where existing solutions have massive problems. Worse, for anti-censorship developers and malware developers, every aspect not covered by that notion of security is troublesome:

Oppressive regime needs you to find the difference between these developers and these developers. They're the same developers.

  • TLS fingerprinting: information in handshake messages, like ciphersuites and extensions, contains implementation detail, which can be used to identify a certain proxy implementation.
  • Active probing: an attacker can perform active probes to see how the server reacts. Many proxies have special handling for certain types of error, thus giving themselves away. Success by getting caught.
  • Behavioral fingerprinting: For example, long-lived connections need to be kept alive. TCP keepalive, TLS heartbeat or application level message? A feature. Time between each keepalive message? A feature. To reconnect or not if the message is lost? A feature.
  • Flow analysis: TLS padding is rarely used in practice. Even if used, padding sizes are usually uniformly distributed so an average cancels them out. Then the attacker proceeds with (direction, length, timing) triples of every message. Surely, the last two elements can never be exact, but can still contribute to classification.

TLS-in-TLS detection is just a specialized version of flow analysis because the length and timing of a browser-initiated TLS handshake are too typical. It takes no machine learning to classify: Client Hello is always 517 bytes, Certificate is always huge like about 4KiB, predictable timing like the first Application Data after Change Cipher Spec. People have been classifying applications for so long.

@nekohasekai

This comment was marked as off-topic.

@JimmyHuang454

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@nekohasekai

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@RPRX

This comment was marked as off-topic.

@nametoolong
Copy link

nametoolong commented Sep 1, 2023

(continued)

To mitigate these problem, there came uTLS, application fronting, padding and a lot more techniques. What is being discussed here is padding.

I wonder if ECH and/or GREASE in the inner layer would (temporarily) confuse those heuristics.

Who knows (except that inner ghost). No one knows what those heuristics are. Fixing broken heuristics to accommodate ECH is just several hours.

Padding has shown to be somewhat effective in hiding inner stream characteristics. Good news: TLS can be arbitrarily shaped. Bad news: TLS can be made really slow with long padding. And effectively hiding the inner TLS stream requires rather long padding. XTLS chose to pad only the first few packets and leave the remainder to be raw, unaltered browser traffic, hence the somewhat opinionated breakage of V2Ray's composability.

People diverge about what is the best way to pad records. NaïveProxy uses short, always-on padding with multiplexing. Is long, specialized padding better or is short, indiscriminate padding? And the draft being discussed has not paid attention to the longer stream. Will XTLS allow an attacker to determine what website you're visiting? Likely if you are of high suspect.

That said, the padding strategy somewhat subjective and subject (pun intended) to the threat model. Xray has an opinionated threat model yet advertises itself as the objectively better choice. Trojan and Naïve has a very generic threat model, but it can be easily hammered down (with massive collateral damage) in situations like Iran. Is Xray bad by such advertisement? WireGuard is doing the same thing. Xray did well in being the stable go-to solution for many people.

Please at least take a look at the threat model before starting a flamewar. You can get away with nearly any protocol with a very clean IP. You can get away with Trojan-over-Trojan. There are one thousand weird ways to build a usable protocol, yet people want a stable protocol. There are even times when XTLS can get you into trouble, but how can you know that without learning as much as to become a developer? It is sheer hard to predict new features to build into a protocol. What if another inner ghost leaks that the GFW is using another way of detection instead of TLS-in-TLS length? Just try to be nice please.

@TheyCallMeSecond

This comment was marked as off-topic.

@mmmray

This comment was marked as off-topic.

@wkrp
Copy link
Member

wkrp commented Sep 1, 2023

TLS is unlikely to be confused with a protocol like SSH or SMTP, where it is the server that sends first, rather than the client.

Inside a tunnel, of course, you have freedom to make the communication pattern anything you want it to be. You can specify a scheme for padding and have packet sends operate according to a schedule that is independent of the actual application payload (TLS or whatever it may be). You could even make it look like the tunneled protocol is a "server first" protocol, by having the server send padding at the start, and having the client buffer its initial data until it has received the padding from the server. #9 (comment) has a sketch of such a padding scheme, and "Security Notions for Fully Encrypted Protocols" from this year's FOCI also shows how to achieve arbitrary traffic patterns using buffering.

Even tunnel protocols that do not nominally support padding might be able to work in this way. Shadowsocks AEAD does not have explicit padding support, but you can simulate padding by encrypting and tagging zero-length ciphertexts. So a Shadowsocks AEAD tunnel could disguise its contents by first having the server send sufficiently many empty ciphertexts to make, say, 200 bytes; the client would buffer its outgoing data until after receiving bytes from the server, and then both sides could continue normally. Something like this is likely to defeat a simple tunneled TLS detector.

Here's an idea for a hack to transforming TLS into a "server first" protocol, even when the tunnel protocol does not support padding. You can have the server immediately transmit a fixed prefix of the Server Hello, say \x16\x03, without waiting for a Client Hello. Have the client wait until it receives that fixed prefix, then continue as normal. When it comes time for the server to send its Server Hello, it omits the prefix that it has already sent. I don't know, is there something in the TLS specification that would prevent it from working? This specific idea would be easy to detect in itself (because the server always sending 2 bytes is a strong signature), but it could be useful as a test.

This would be easy to implement as a pair of netcat-like wrapper programs. On the client:

client_socket = accept(127.0.0.1) # local connection
server_socket = connect(...) # remote connection
prefix = recv(server_socket, 1) # wait to receive at least 1 byte from the server
client-to-server thread:
  copy from client_socket to server_socket
server-to-client thread:
  send(client_socket, prefix)
  copy from server_socket to client_socket

On the server:

client_socket = accept(0.0.0.0) # external connection from client
server_socket = connect(127.0.0.1) # upstream connection to the real TLS server
send(client_socket, "\x16\x03") # immediately send prefix of Server Hello
client-to-server thread:
  copy from client_socket to server_socket
server-to-client thread:
  prefix = recv(server_socket, 2) # read the prefix of the real TLS Server Hello
  if prefix != "\x16\x03":
    abort() # uh oh, the prefix we already sent does not match what the TLS server sent
  # discard the real prefix because we have already sent it
  copy from server_socket to client_socket

@RPRX
Copy link

RPRX commented Sep 1, 2023

@wkrp 感谢你的补充。加密隧道内的流量分析是一个已经有很多研究的领域,早在 VLESS BETA 中我也简述了两种最基本的分类:

  • 协议带来的,比如 Socks5 over TLS 时的 Socks5 握手 ,TLS 上不同的这种特征对于监测者来说就是不同的协议
  • 行为带来的,比如访问 Google 首页时加载了多少文件、顺序、每个文件的大小,多套一层加密并不能有效掩盖这些信息

VLESS Flow 与 Seed 存在的目的就是提供不同的流量模式、可配置的策略以应对这些威胁,包括 reshape 成目标网站的形状。

它们各有优劣,近期我也在研究解决一些鲜有人提及的问题。我相信 IETF 在 ECH 后至少还有一些配套的标准要制定、推动。


Thanks for the addition. Traffic analysis inside encrypted tunnels is an area that has been the subject of a lot of research, back in VLESS BETA I also briefly described the two most basic classifications:

  • Protocol-based, e.g., the Socks5 handshake over TLS, where different characteristics of TLS are different protocols for the observer.
  • Behavior-based, e.g., how many files are loaded, in what order, and the size of each file when accessing Google's home page, which cannot be effectively masked by an additional layer of encryption.

VLESS Flow and Seed exist to provide different traffic patterns and configurable policies to address these threats, including reshaping into the shape of a target website.

They each have their advantages and disadvantages, and recently I've been working on solving some of the lesser-mentioned problems. I'm sure the IETF has at least a few more standards to develop and promote after ECH.

@wkrp
Copy link
Member

wkrp commented Sep 1, 2023

我相信 IETF 在 ECH 后至少还有一些配套的标准要制定、推动。

I'm sure the IETF has at least a few more standards to develop and promote after ECH.

I have been watching MASQUE, which is developing standards for proxying (even proxying UDP and IP packets) over HTTP/3. It's used in iCloud Private Relay already. From what I have seen of their documents, there is not much in the way of protection against protocol fingerprinting. draft-ietf-masque-quic-proxy-00 briefly mentions padding, in the context of input–output correlation:

An attacker on both sides of the proxy can use the size of ingress and egress packets to correlate packets belonging to the same connection. (Absent client-side padding, tunnelled packets will typically have a fixed amount of overhead that is removed before their HTTP Datagram contents are written to the target.)

HTTP/2, QUIC, and TLS have provisions for padding, but they are kind of "inside-out" from how we would prefer them to be. It's easier for our purposes if the padding is the outer layer: padding(TLS), not TLS(padding).

This recent Tor proposal is related. It doesn't have specific solutions, more of an overview of the problem (also including things like circuit tagging that are specific to Tor).

"Prioritizing Protocol Information Leaks in Tor"
https://gitlab.torproject.org/tpo/core/torspec/-/blob/7ca7ed317a7d0dc668b6ff1608377324ecaf937e/proposals/344-protocol-info-leaks.txt

1.2.1. Handshakes with unique traffic patterns

Certain aspects of Tor's handshakes are very unique and easy to fingerprint,
based only on observed traffic timing and volume patterns. In particular, the
onion client and onion service handshake activity is fingerprintable with
near-zero false negatives and near-zero false positive rates, as per
[ONIONPRINT].

1.3.3. Passive Application-Layer Traffic Patterns

This category of information leak occurs after a client has begun using a
circuit, by analyzing application data traffic.

Examples of this class of information leak include:

  • Website traffic fingerprinting
  • End-to-end correlation

But I think your approach is in the right direction. We need to get away from simplistic padding schemes that only add padding or split packets, like traffic morphing and obfs4's iat-modes, and actually decouple the traffic patterns from the application that is tunneled from the traffic patterns of the tunnel itself. One question I am still struggling with is what the tunnel's traffic sending schedule should actually be; i.e., client-first or server-first, what mix of burst sizes and directionality, etc. Cf. #255 (comment).

@mmmray
Copy link
Author

mmmray commented Sep 1, 2023

One question I am still struggling with is what the tunnel's traffic sending schedule should actually be; i.e., client-first or server-first, what mix of burst sizes and directionality, etc. Cf. #255 (comment).

ideal pattern would be aligned with traffic of the target website in the case of XTLS or domain-front. However, in those deployments the proxy does not know what normal traffic to the target website looks like.

If I (as Xray operator) operated the target website myself, I would have precise traffic measurements to the real website that I can extract patterns from. Therefore I think it would be ideal if the pattern itself would be part of client configuration, similar to SNI in domain-fronting, and that a protocol would leave a generous amount of API surface for custom patterns. Ideally there'd be both 1) initial pattern as part of client config for bootstrapping 2) next to transmitted data, a place for the server to tell the client which pattern to use next

VLESS Flow and Seed exist to provide different traffic patterns and configurable policies to address these threats, including XTLS/Xray-core#1567 (comment) into the shape of a target website.

I found this document with the same info, are there details on what can be (or has to be) configured? or is flow/seed still WIP

@diwenx
Copy link

diwenx commented Sep 2, 2023

ideal pattern would be aligned with traffic of the target website in the case of XTLS or domain-front.

I think this might be where ECH can actually be helpful. Precisely matching the traffic patterns of a somewhat "static" website can be challenging to get every details right. Targeting a "generic" domain used as the outer ECH SNI (e.g., cloudflare-ech.com) should be less demanding.

@wkrp wkrp reopened this Sep 2, 2023
@klzgrad
Copy link

klzgrad commented Sep 2, 2023

there is at least one writeup in progress about (1), but as far as I know it's not public yet

Is it ok to discuss this paper publicly when it was shared privately?

To answer it directly, I am not aware of any article available yet in English on the topic of either (1) how TLS-in-TLS detection might work conceptually or (2) whether or how TLS-in-TLS detection is done by the GFW. As has been alluded to in the thread, there is at least one writeup in progress about (1), but as far as I know it's not public yet. As for (2), there is some evidence in the form of user blocking reports, but as yet I'm not aware of any controlled experiment.

The formation of the concept TLS-in-TLS in the community implied a belief in a certain heuristic that this is a more important traffic feature than others, but research in this field would typically study general traffic classification with general ML models instead of heuristics and hand-crafted classifiers, and as a result there is less insight into the explanability of the ML models on which features are the structurally more important factors and why that is the case.

And the reason for this belief in the heuristic is also because it is doubly more difficulty to build a general traffic obfuscator that can reliably defeat a general traffic classifier than to build the classifier itself, as there is no publicly available general traffic classifiers in this space (the ones deployed in China, Iran, etc.) than can be used to study its behaviors and build adversarial general obfuscator model (i.e. traffic morphers, traffic shapers, etc. mentioned above) and then experiment and verify their performance against the classifiers. So instead if the most important factor on the general traffic classifiers can be identified, then it is much easier and realistic to implement specific, scope-limited countermeasures to mitigate and hinder the general traffic classifiers.

@nametoolong
Copy link

Is it ok to discuss this paper publicly when it was shared privately?

To be fair, I was not contacted by the original authors. The authors of said paper posted a graph in a public thread. I kept looking at the graph until I found out it is related to an unpublished research. I am very sorry about leaking the details and have redacted related information from my comments.

it is doubly more difficulty to build a general traffic obfuscator

This. SSRoT is another project that just works. Survived with glaring features like original SSR. Does it mean SSRoT's obfuscation is effective? No, it is because no one cared enough.

@net4people net4people deleted a comment from mobilelifeful Sep 3, 2023
@RPRX
Copy link

RPRX commented Sep 3, 2023

昨天我和 @yuhan6665 再次联系了那篇论文的作者询问何时发布,作者说“那篇论文提交审核且通过了 不过我们还在进行一些修改 比如强调vision的结果是另一个更有针对性的classifier做出来的 以避免一些误会”,“预计在九月下旬就会发布”

Yesterday, @yuhan6665 and I contacted the author of the paper again to ask when it will be released, and the author said "the paper was submitted for review and passed, but we are still making some changes, such as emphasizing that the results for vision are from another classifier that is more specific to the topic, so as to avoid some misunderstandings". "We expect it to be released in late September."

@Testeera
Copy link

Testeera commented Sep 3, 2023

昨天我和 @yuhan6665 再次联系了那篇论文的作者询问何时发布,作者说“那篇论文提交审核且通过了 不过我们还在进行一些修改 比如强调vision的结果是另一个更有针对性的classifier做出来的 以避免一些误会”,“预计在九月下旬就会发布”

哪国的作者

What country is the author from

@RPRX
Copy link

RPRX commented Sep 3, 2023

此外我想讨论一下机器学习的性能、可解释性与误报率问题,以避免一些误解。我不是这方面的专家,但我确实有过一些研究,并实际训练、部署过一些模型。

很多人对机器学习或深度学习的印象是“耗资源”,其实不完全准确。因为对于绝大多数模型,明显的“耗资源”仅是“训练”,即 不断调整各处权重 这一过程(当然采集数据、标注也算,防杠),而训练好后它基本上就是一个固定的算法,部署它需要的资源相对来说非常少。对于时序流量这样相对不复杂的数据源,可以确定的是训练模型、部署模型所需的资源都相对更少。

可解释性这个问题,主要是因为训练出来的模型可能只是“以奇怪的方式刚好 work”,不直观,人类只是知道它 work,但看不懂它是怎么 work 的,当然参数量大的话更看不懂。不过,对于相对不复杂的数据源,控制变量进行测试即可探究出它的原理。

最后是误报率,在资源相同的情况下,针对性越强误报率越低,这个应该不说就能想到。实际上我想说的是,我认为对 GFW 而言它要封 TLS 类会有一些 权重,无论是手写的还是内化到模型里(端到端),而非仅凭单一特征,这样可以有效降低它的误杀率


Additionally I'd like to discuss the performance, interpretability vs. false positives of machine learning to avoid some misunderstandings. I am not an expert in this area, but I have done some research and actually trained and deployed some models.

Many people have the impression that machine learning or deep learning is "resource intensive", which is not entirely accurate. Because for the vast majority of models, the obvious "resource intensive" part is only the "training"; that is, the process of constantly adjusting the weights (of course, data collection and labeling also count), and after training there is basically a fixed algorithm, deployment of which requires very few resources. For a relatively uncomplicated data source such as time-series traffic, it is certain that the resources needed both to train and deploy the model are relatively few.

Interpretability is a problem mainly because the trained model may just "work in a weird way", which is not intuitive. Humans just know it works, but they can't understand how it works, and even more so if the number of parameters is large. However, for relatively uncomplicated data sources, testing with control variables will allow you to find out how it works.

Finally, regarding the false positive rate, all else being equal, the more targeted it is, the lower the false positive rate, this should go without saying. In fact, I'd like to say that I think for GFW to interrupt TLS it would have some weights, either handwritten or internalized into the model (end-to-end), instead of relying on not just a single feature, so as to effectively reduce its false positive rate.

@RPRX
Copy link

RPRX commented Sep 4, 2023

ideal pattern would be aligned with traffic of the target website in the case of XTLS or domain-front.

I think this might be where ECH can actually be helpful. Precisely matching the traffic patterns of a somewhat "static" website can be challenging to get every details right. Targeting a "generic" domain used as the outer ECH SNI (e.g., cloudflare-ech.com) should be less demanding.

我觉得 ECH 是这些技术的 IETF 版,也有一些共同的问题。从技术上来讲,一个明确的 "generic" domain 确实看起来更加合理,但从实践上来讲,这样的事情不会被 GFW 所接受。因为它绝对不会允许人们大规模地不挂任何代理而 直接浏览它想封锁的网站,否则依附于 GFW 的存在而建立起的一系列审查制度都会变成一纸空文。对 GFW 来说,若无法精准封锁,那就只剩全封这一个选择。就像 HTTP 时代它还能检测一下你的关键字,而 HTTPS 时代若你不配合屏蔽一些内容,那就会封掉你的整个域名。

I think ECH is the IETF version of these technologies, and there are some common problems. Technically, an explicit "generic" domain does seem to make more sense, but practically, something like this wouldn't be acceptable to GFW. It will never allow people to browse sites it wants to block on a large scale without any proxies, otherwise all the censorship that has been built up around the existence of GFW would be rendered useless. For GFW, if you can't block it precisely, then you have no choice but to block it all. Just like in the HTTP era, it can still detect your keywords, while in the HTTPS era, if you don't cooperate in blocking some content, it will block your whole domain.

@RPRX
Copy link

RPRX commented Sep 8, 2023

that this is a more important traffic feature than others

继续这个话题,我觉得 TLS-in-whatever 特征不一定是 more important,不过它至少比较明显、影响广泛、易于被审查者所利用:

  • TLS 有比较固定的握手包长度、通信模式等,故代理 TLS 时,若不进行针对性处理就很容易暴露,我们知道,审查者也知道
  • TLS 又是最常见的流量类型,无论代理本身是什么形式,被代理的流量基本上就是 TLS,最大化了这个问题的影响程度
  • 注意对于“正常”的非代理类流量来说,几乎不会出现这一特征,这就使得审查者可以区分“正常”流量和代理类流量

以上这些因素叠加导致了如果我们要写一个“尽量不暴露自己是代理”的代理,就不得不以某种形式处理 TLS-in-whatever 特征。

Continuing on this topic, I don't think the TLS-in-whatever feature is necessarily more important, but it is at least more obvious, more widespread, and more easily exploited by censors:

  • TLS has fixed handshake packet lengths, communication patterns, etc., so it's easy to expose proxies to TLS if they don't do something specific to them, and we know that, and so do the censors.
  • TLS is also the most common type of traffic, regardless of the form of the proxy itself, the traffic being proxied is basically TLS, maximizing the impact of the problem.
  • Note that for "normal" non-proxy traffic, this feature is almost never present, which allows the censor to distinguish between "normal" traffic and proxy traffic.

The combination of these factors leads to the fact that if we want to write a proxy that "tries not to reveal itself as a proxy", we have to deal with the TLS-in-whatever feature in some way.

@wkrp
Copy link
Member

wkrp commented Sep 8, 2023

  • TLS 又是最常见的流量类型,无论代理本身是什么形式,被代理的流量基本上就是 TLS,最大化了这个问题的影响程度

  • TLS is also the most common type of traffic, regardless of the form of the proxy itself, the traffic being proxied is basically TLS, maximizing the impact of the problem.

This is a key point. The censor's goal is not to detect tunneled TLS per se—it is to detect tunnels, period. It's just that because TLS makes up such a large proportion of all traffic, whenever you have any kind of tunnel, at some point you are going to send TLS through it. If you don't do something to disguise the timing and directionality pattern of TLS, then that evidence of tunneling will show through. In other words, it's not the "TLS" part the censor cares about, it's the "tunnel" part. The "TLS" is just a handy identifier because it's so common and it has a characteristic traffic pattern.

@diwenx
Copy link

diwenx commented Sep 8, 2023

TLS is also the most common type of traffic, regardless of the form of the proxy itself, the traffic being proxied is basically TLS, maximizing the impact of the problem.

This is spot on. You cannot avoid generating this signature by simply not using TLS.
And for the exact same reason, the prevalence and fingerprintability of TCP handshakes (3-way hello and 4-way finish) present a vulnerability for layer 2 tunnels like obfuscated VPNs, regardless of whether the VPN itself is TCP- or UDP-based.

If you don't do something to disguise the timing and directionality pattern of TLS, then that evidence of tunneling will show through.

It would be great if browsers could share some responsibility with proxies. Imagine a GREASE extension that could fit into any TLS packet type, serving only to inflate packet sizes. But I guess as long as TLS-in-TLS remains a "niche" security concern affecting only users from select regions, such an initiative may remain unlikely.

@wkrp
Copy link
Member

wkrp commented Sep 8, 2023

It would be great if browsers could share some responsibility with proxies. Imagine a GREASE extension that could fit into any TLS packet type, serving only to inflate packet sizes. But I guess as long as TLS-in-TLS remains a "niche" security concern affecting only users from select regions, such an initiative may remain unlikely.

TLS does have a built-in feature to pad records that are encrypted:

https://www.rfc-editor.org/rfc/rfc8446.html#section-5.2

struct {
    opaque content[TLSPlaintext.length];
    ContentType type;
    uint8 zeros[length_of_padding];
} TLSInnerPlaintext;

https://www.rfc-editor.org/rfc/rfc8446#section-5.4

All encrypted TLS records can be padded to inflate the size of the TLSCiphertext. This allows the sender to hide the size of the traffic from an observer.

When generating a TLSCiphertext record, implementations MAY choose to pad. An unpadded record is just a record with a padding length of zero. Padding is a string of zero-valued bytes appended to the ContentType field before encryption. Implementations MUST set the padding octets to all zeros before encrypting.

Application Data records may contain a zero-length TLSInnerPlaintext.content if the sender desires. This permits generation of plausibly sized cover traffic in contexts where the presence or absence of activity may be sensitive. Implementations MUST NOT send Handshake and Alert records that have a zero-length TLSInnerPlaintext.content; if such a message is received, the receiving implementation MUST terminate the connection with an "unexpected_message" alert.

I don't think you can use this record padding feature in the Client Hello, but there you can use the padding extension:

This memo describes a Transport Layer Security (TLS) extension that can be used to pad ClientHello messages to a desired size.

Here's @ValdikSS's past demonstration of using the padding extension to get around a filter:

https://ntc.party/t/http-headerstls-padding-as-a-censorship-circumvention-method/168
https://ntc.party/t/firefox-for-android-with-tls-padding-for-censorship-circumvention/1725

That was intended for when the TLS is sent without a tunnel around it, but it could also work to break up the traffic signature. Unfortunately, merely adding padding doesn't change the overall directionality of bursts. I'm not sure if it's possible in TLS to, for example, send "no-op" records before the handshake, and in any case changing the directionality would likely require cooperation from the server.

@diwenx
Copy link

diwenx commented Sep 8, 2023

but then, i realized, i have no idea how TLS-in-TLS is detected today. is there an (english) summary of it?

I was referred to a paper from this year's SIGCOMM
GGFAST: Automating Generation of Flexible Network Traffic Classifiers
https://dl.acm.org/doi/pdf/10.1145/3603269.3604840

While it doesn't look at TLS-in-TLS specifically, section 7 explores encrypted flow classification, with a subsection on how to classify SMTP flows when tunneled within TLS.

We trained an SMTP classifier, using 25,000 flows of plaintext SMTP traffic <...> evaluated it on the TLS flows of that same dataset, using the TLS sequence-of-lengths variant.

The methodology proposed in the paper might provide insights into detecting TLS within TLS. The basic premise is to train a classifier on plaintext protocols (like plain TLS) using features that remain stable/visible post-encryption such as packet sizes, direction, and timing. This classifier can then be applied to the payload part of encrypted flows.

It seems that their classifiers achieved pretty decent precision for detecting SMTP-in-TLS.

Only a small fraction (0.4%) of other non-SMTP TLS flows are mislabeled as SMTP <...> out of the 14,474 false positives, 9,200 correspond to IMAP-over-TLS and POP3-over-TLS traffic. Although these are still false positives, these protocols are adjacent to SMTP and have very similar syntax.

@klzgrad
Copy link

klzgrad commented Sep 15, 2023

TLS does have a built-in feature to pad records that are encrypted

And HTTP/2 the protocol also has builtin padding fields, but among the implementations of the two protocols paddings are mostly an afterthought and it's annoying to try to create paddings through the existing APIs of these implementations without patching their cores. In terms of sustaining long-term maintenance, I'd prefer not to use the forgotten builtin paddings.

One question I am still struggling with is what the tunnel's traffic sending schedule should actually be; i.e., client-first or server-first, what mix of burst sizes and directionality

I have this intuition so far: The tunnel's traffic schedule should parrot what the tunnel "should" look like as if it is not a tunnel. As an example, I have a HTTP/2 tunnel that sends alpn with h2 and is based on actually existing HTTP/2 implementations, then the tunnel payload should be reorganized into what a regular HTTP/2 connection would look like: a series of 50-200 bytes of requests and a bunch of large downloads. The issue of directionality can be explained away by HTTP/2 pipelining and multiplexing, e.g. even though the TLS handshakes look like several ping pong round trips, it is just the client sending CSS requests first and image requests later. I don't know if this pass the dead parrot test or not, just a thought.

The struggle is probably of coming up with a general traffic schedule, but if there is a more specific scope for parroting, it is easier to narrow down the target distribution. But not too specific, this is not the classic definition of parroting, as we not are parroting a particular application or protocol in terms of their structures, but their traffic distributions. The straightforward and brutal force way would be to train a generative model given sufficient data of an entire class of target traffic, and use that to generate the traffic schedule you want. But I hope there are cheaper heuristics that just raise the floor of detection high enough to achieve circumvention.

Edit: One more thing. I believe the traffic schedule should be more general than site-based. Re:

If I (as Xray operator) operated the target website myself, I would have precise traffic measurements to the real website that I can extract patterns from

There are several issues with this level of specificity. It's not economical to require every operator to generate their own traffic schedules as it requires highly automated tooling for generation (which OS, which browser, generate a schedule per OS/browser? What about update or concept drift?), and more tools for verifying the generated schedules are actually ok (a. The operator can inadvertently generate a traffic profile of e.g. Google.com that is known to every website fingerprinter if they choose to mirror it; b. Is even possible to have this kind of adversarial tools). The schedules generated from the data of one website, due to its limited scope, may be too specific to risk becoming the classic parrot.

@wkrp
Copy link
Member

wkrp commented Sep 19, 2023

But I hope there are cheaper heuristics that just raise the floor of detection high enough to achieve circumvention.

My thoughts are in this direction as well. In website fingerprinting research they always try to quantify the overhead: how much the defense costs, in terms of bandwidth and latency. But for our purposes, it's likely that the very beginning of a connection is, by far, the most important. (Probably just the first few packets, even.) If we traffic-shape just, say, the first 10 KB of a connection in both directions, and revert to "natural" shaping after that, that's likely to put us ahead of the game for a long time, and the overhead will be asymptotically negligible.

Rather than first trying to figure out the question of what a traffic schedule should look like, I'm thinking about the possibility of defining a "challenge" with a few simple schedules for circumvention developers to try implementing. These would not be "strawman" schedules, not designed for effective circumvention, but just to give developers a common target to work towards (as I expect implementing even these will require some internal code restructuring). When a few project have developed the necessary support for shaping traffic according to a schedule, then it will be easier to experiment with alternatives.

This is the kind of thing I am thinking of:

Traffic schedule I (constant rate, no randomness, client and server schedules independent)
Client
  1. Connect to server.
  2. Send a burst of 5 KB.
  3. Sleep 500 ms.
  4. Go to 2.
Server
  1. Wait for incoming connection.
  2. Send a burst of 5 KB.
  3. Sleep 500 ms.
  4. Go to 2.
Traffic schedule II (server starts, random sizes)
Client
  1. Connect to server.
  2. Wait to receive at least 100 bytes from server.
  3. Send an amount of data randomly selected from {120, 170, 250} bytes.
  4. Send 1400x + y bytes, where x is random in {0, …, 5} and y is random in {0, …, 1400}.
  5. Sleep (100 + 100×Beta(1.0, 5.0)) ms.
  6. Go to 4.
Server
  1. Wait for incoming connection.
  2. Send an amount of data randomly selected from {250, 255, 270}.
  3. Wait to receive at least 1000 bytes from client.
  4. Send 1400x + y bytes, where x is random in {0, …, 5} and y is random in {0, …, 1400}.
  5. Sleep (100 + 100×Beta(1.0, 5.0)) ms.
  6. Go to 4.
Traffic schedule III (random sizes, multiple simulated processes, dependence on different kinds of inner messages)
Client
  1. Connect to server.
  2. Send a number of bytes randomly selected from {1000, 1200, 1250}.
  3. Independently:
    1. Send random(100, 4000) bytes, sleep random(10, 50) ms, repeat.
    2. Every 20 s, send a "ping" message of 20 bytes.
Server
  1. Wait for incoming connection.
  2. Wait for at least 1 byte from client.
  3. Independently:
    1. Send random(100, 4000) bytes, sleep random(10, 50) ms, repeat.
    2. Wait for a "ping" message, send a "pong" message of 40 bytes, repeat.

@yuhan6665
Copy link

yuhan6665 commented Sep 19, 2023

@wkrp thanks for the writeup. Seems a good coding task for us. So far we tried to implement a simple and efficient structure in Xray for padding and shaping the first few packets. It only account for number of packets. In the future, we should add more state of traffic like bytes received. We are currently looking to release the customization capability of these schedules to user. I wonder what is the suitable way/level of config. (@RPRX thought of putting everything into a "seed" like Minecraft)

I also find your choice of specific number interesting. Some number I roughly get, like 1400, ping 20 pong 40 are common traffic patterns. What about {120, 170, 250} {250, 255, 270}? Also what is the probability meaning for choosing Beta distribution?

@wkrp
Copy link
Member

wkrp commented Sep 20, 2023

The numbers are just some numbers I made up. There's no meaning to them—I was just trying to think of some schedules that might pose some design challenges.

Please do not take the ideas I sketched as recommendations for good traffic schedules. They are bad traffic schedules, in fact. I am just brainstorming ways to make progress towards general and effective traffic shaping. My thinking is that there are two obstacles: (1) current systems need to be rearchitected to be more flexible in the kind of traffic shaping they support, and (2) we need to find out what traffic schedule distributions are practical and effective. I find myself thinking about (2) perhaps too much (as in #281 (comment)), and I reflected that a more productive path forward may be to get more developers thinking about (1). We can tackle problem (1) first, targeting artificial "strawman" traffic schedules; then we'll have the infrastructure necessary to comfortably experiment with problem (2). My idea was that posting a list of concrete "challenges", we can get everyone working on a common problem and thinking about the issues involved.

I didn't intend #281 (comment) to be a final list of recommendations. I think it should get some more thought. But a list of traffic shaping challenges could look something like that.

The beta distribution is just from my intuition that uniform distributions are maybe not the best for natural-looking traffic features. But it's not important: the goal is not to prescribe a specific algorithm for implementation, it's to demonstrate that the software can handle different kinds of distributions. You can replace it with a uniform distribution or whatever. These are not recommendations for anything to be shipped to users, at this point.

What I mean when I talk about design questions involved in traffic shaping, is that implementing even a simple traffic schedule requires at least two things:

  1. A send buffer of outgoing data that is ready to be sent, but it waiting for the traffic scheduler to schedule a send event.
  2. A padding generator to create data to send when the traffic scheduler calls for it, even if there is no "real" data waiting in the send buffer.

These two things are what is required to move beyond simplistic, one-packet-at-time padding, and really decouple the observable traffic features of the tunnel from the traffic features of the application protocol inside the tunnel.

Implementing this properly may require you to turn the main loop of your program "inside-out". I wrote about this in the past and made a sample patch for obfs4proxy:

https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html

The current implementation, in pseudocode, works like this (transports/obfs4/obfs4.go obfs4Conn.Write):

on recv(data) from tor:
	send(frame(data))

If it instead worked like this, then obfs4 could choose its own packet scheduling, independent of tor's:

on recv(data) from tor:
	enqueue data on send_buffer

func give_me_a_frame(): # never blocks
	if send_buffer is not empty:
		dequeue data from send buffer
		return frame(data)
	else:
		return frame(padding)

in a separate thread:
	buf = []
	while true:
		while length(buf) < 500:
			buf = buf + give_me_a_frame()
		chunk = buf[:500]
		buf = buf[500:]
		send(chunk)
		sleep(100 ms)

The key idea is that give_me_a_frame never blocks: if it doesn't have any application data immediately available, it returns a padding frame instead. The independent sending thread calls give_me_a_frame as often as necessary and obeys its own schedule. Note also that the boundaries of chunks sent by the sending thread are independent of frame boundaries.

I attach a proof-of-concept patch for obfs4proxy that makes it operate in a constant bitrate mode.

Also compare to the discussion in the recent "Security Notions for Fully Encrypted Protocols":

To avoid traffic analysis based on message length, we give a novel security notion for FEPs called length shaping, in part inspired by real-world concerns. It requires that the protocol be capable of producing any given number p of bytes of valid ciphertext data on command. While protocols like Obfs4 will add specified padding to the input, we require length shaping to apply to the output to provide greater control over the lengths of network messages. Length shaping precludes the existence of a minimum message length, and, more generally, the output lengths can be shaped arbitrarily, such as into a data-independent pattern or that of a different FEP.

In their sample protocol of Figure 1, obuf is the send buffer I talked about, and p‖0p is the padding generator.

@stevejohnson7
Copy link

stevejohnson7 commented Nov 23, 2023

The paper you are seeking about TLS-in-TLS detection is released in USENIX 2023:
https://www.usenix.org/conference/usenixsecurity24/presentation/xue

Source: Xray Telegram Group

@yuhan6665
Copy link

Thanks for sharing. Diwen Xue also recommended another paper of interest
https://www.robgjansen.com/publications/precisedetect-ndss2024.pdf

@klzgrad
Copy link

klzgrad commented Nov 25, 2023

Thanks for sharing. Diwen Xue also recommended another paper of interest https://www.robgjansen.com/publications/precisedetect-ndss2024.pdf

The recommendation suggests that even a detector with less than practical precision/false positive rate can not be underestimated, because it will intuitively become more powerful when it gets aggregated in coarse grained analysis. So obfuscation strength matters quantitatively and it's always useful to increase it.

A simple countermeasure to host-based analysis is to insert dummy flows at host level. But it may be logistically difficult to generate diverse traffic from diverse sources to the circumvention bridge at low cost.

@wkrp
Copy link
Member

wkrp commented Nov 30, 2023

Thanks for sharing. Diwen Xue also recommended another paper of interest https://www.robgjansen.com/publications/precisedetect-ndss2024.pdf

There is a thread now for this paper, with a summary: #312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests