Failed FTPs and HTTPS to Java connections to Java services after kernel upgrade.

Hristofor Pamyatnih
4 min readJul 18, 2019

--

Just another Friday detective story.

Couple of weeks ago all major Linux vendors released fix for MDS vulnerability in the kernel — CVE-2018–12126, CVE-2018–12130, CVE-2018–12127 and CVE-2019–11091. Two weeks after release of kernel 3.10.0–957.21.3 by RedHat and CentOS, our customer logged ticket that after the kernel update, FTPs transfers become to fail. In the first ticket the version of the kernel was wrong, I made sanity check on same OS/kernel as reported and closed the ticket as not reproducible. Also I said “There is 90% possibility that the issue is caused by blocking random number generator, because the transfers was stalled, but not failed and it smells like lack of entropy in the system”

On the next day my colleague from support call me and told me — they have issue with latest version of the kernel, and we have environment with reproducible issue. I installed latest kernel, reboot the machine and voila — I was not able to log in with FTPS client.

Just to note — our product is Java based MFT solution supporting various protocols and there is Apache FTP server behind our FTPs implementation.

I made a copy of product instance on another machine with previous version of kernel and tested it. Everything was fine. So the problem was in the kernel. Java application stops working after kernel upgrade? Are you kidding me?

Comparison between traffics dumps on both machine showed that server is not sending certificate bundle during handshake. The frame was just lost. No error messages, no logs, no exceptions, nothing. I checked the release notes of the kernel — the only changes was:

Bugs fixed (https://bugzilla.redhat.com/):

1719123 - CVE-2019-11477 Kernel: tcp: integer overflow while processing SACK blocks allows remote denial of service
1719128 - CVE-2019-11478 Kernel: tcp: excessive resource consumption while processing SACK blocks allows remote denial of service
1719129 - CVE-2019-11479 Kernel: tcp: excessive resource consumption for TCP connections with low MSS allows remote denial of service

So I become to dig in the code changes. Nothing suspicious, but I’m definitely not good in kernel development. After some shooting in the dark I found the following counter in the output of netstat -s:

TCPWqueueTooBig: 776

Well it does not looks familiar. Why not to dig in this direction? And here is what I found:

TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.

A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.

And the code change:

+	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
+ return -ENOMEM;
+ }

Looks like ftp server is trying to put too much data in the socket.

And let the fun begin.

I took latest version of Apache ftp server — 1.1.1 copied all the certificates, tried to reproduce the issue, bu I found that the part of the code which is sending certificate bundle to client is our custom code, so no, it can’t be fixed by library upgrade.

So we have issue with buffer size. Lets check can we play with numbers. After some blind tries I found that increasing of receiveBufferSize from 512 to 4096 solve the problem. Automated test passed without any new issues or even performance degradation, other protocols was not affected, so I forgot about the story for two weeks.

Two weeks later we received similar bug report from another customer. They upgraded kernel to 2.6.32–754.15.3, it was RHEL 6 in this case and somehow certificate authentication over HTTPS protocol stops to work.

I was happy, because I received tarball with exact customers setup an just restored it to my VM. First I tested on RHEL 7 with latest kernel. Everything was working as expected — FTPS was hanging, but HTTPS was OK. I built new VM with latest RHEL 6 and copied the instance of the product onto. The big surprise — on RHEL 6 FTP was working, but SSL handshake over HTTPS was hanging after sending of server certificate chain to client side, until session timeout, and than rest of the handshake data was sent. Damn, there was no tunable buffer settings for HTTP daemon exposed. So lets try it the hard way.

This page becomes my favorite last days. I knew that the application is trying to put all the data in one TCP frame, but OS was refusing to fragment it. I dig int the settings related to TCP fragmentation, and just when I become desecrated, I had lucky shot:

tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values:
0 - Disabled
1 - Disabled by default, enabled when an ICMP black hole detected
2 - Always enabled, use initial MSS of tcp_base_mss.

The default setting was zero. Setting it to 2 resolved the issue. The magic command:

sysctl -w net.ipv4.tcp_mtu_probing=2

Automation and performance testing lies ahead, but I think we will have just another happy customer.

--

--