百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 技术资源 > 正文

记一次集群内无可用http服务问题排查

off999 2025-03-06 18:27 10 浏览 0 评论

前一阵子发现服务会有偶发的服务不可用的情况,记录一下这个问题的排查过程。

现象是这样的:每天到了某个时间点,就会出现服务不稳定的情况,偶发接口调不通。

线上业务使用了lvs-nginx-tomcat三层结构,首先查看tomcat监控,没有什么特别异常的情况,响应时间和错误码没发现有什么异常,CPU、IO等等指标也都正常。

再查看nginx上的监控,发现在某个时刻这个服务的5xx报错突增,大概7、8秒之后又恢复了。

继续在nginx服务器上找线索,发现Nginx在那个时间点会出现报错:

2015/12/24 10:30:38 [error] 13433#0: check time out with peer: 10.79.40.1xx:80

线上nginx会每秒探测后端所有服务器的某个uri,如果返回的http状态码是200则认为正常,连续3次探测失败则摘除探测失败的服务器,直到探测成功再恢复。

从日志中可以发现nginx在出问题的时间点对于后端所有tomcat的探测请求都出现了问题,导致摘除了所有后端服务器,在这段时间里请求会报502异常。

从nginx上的日志可以看到探测请求没有返回,那么请求实际发到tomcat了没有?线上业务中的探测频率是1s/次,于是到tomcat的访问日志里查找线索,过滤一个nginx对tomcat的所有探测请求:

可用看出从7:00:10-7:00:40左右的探测请求是有丢失的。

前端机的负载并不高,于是我们第一时间认为这可能是nginx到tomcat服务器的网络有问题。统计了一下线上日志,出问题的机器集中在某个网段,并且集中在一天之内的某几个时间点,这似乎也进一步印证了我们的猜测。

但到此为止仅仅是怀疑,为了证明我们的猜测,我们尝试去复现问题。我们在nginx上部署了一个简单的脚本,用curl命令对同样的tomcat发起每秒一次的请求,但结果比较诡异:

监测方式监测地址http版本频率所在服务器目的服务器问题

nginx/1.01snginxtomcat有

curl/1.01snginxtomcat无

这跟我们之前的猜测不一致,没办法,尝试在两端抓包查看网络状况,

tomcat抓包:

nginx抓包:

tomcat服务器在7:00:10已经接收了请求并且回复了ACK,7:00:13 nginx超时主动断开连接,7:00:15时tomcat才返回数据,网络的问题被排除了。

那么接下来的重点就是tomcat本身,在接收问题请求的时候,tomcat服务究竟做了什么?

还是通过简单的脚本,在容易出问题的时间段连续使用jstack打印线程栈,查找出问题时处于RUNNABLE状态的catalina线程,发现这里有一句很可疑:

这个服务用的还是比较古老的tomcat6.0.32,查看源码,可以发现在tomcat对请求header做完解析之后会调用这个函数:

MessageBytes valueMB = headers.getValue("host");

// Check host header

if (http11 && (valueMB == null)) {

error = true;

// 400 - Bad request

if (log.isDebugEnabled()) {

log.debug(sm.getString("http11processor.request.prepare")+

" host header missing");

}

response.setStatus(400);

adapter.log(request, response, 0);

}

parseHost(valueMB);

....

/**

* Parse host.

*/

public void parseHost(MessageBytes valueMB) {

if (valueMB == null || valueMB.isNull()) {

// HTTP/1.0

// Default is what the socket tells us. Overriden if a host is

// found/parsed

request.setServerPort(socket.getLocalPort());

InetAddress localAddress = socket.getLocalAddress();

// Setting the socket-related fields. The adapter doesn't know

// about socket.

request.serverName().setString(localAddress.getHostName());

return;

}

也就是说,如果request请求的header里没有设置host,那么tomcat会使用自己服务器的hostname作为request对象的host属性。

再对比线上nginx探测的请求和curl发出的请求,可以看出nginx的探测请求确实没有带任何header,而curl请求默认是带了3个header的:

curl:

GET / HTTP/1.0

Host: localhost:8080

User-Agent: curl/7.43.0

Accept: */*

nginx:

GET / HTTP/1.0

到这里可以确认,如果请求的header里没有带Host的话就有可能出现问题。找到了hang住的位置,那么接下来的问题就是,为什么这里会hang住?

第一个问题:这个getHostByAddr在做什么?翻出jvm源码,这个函数的定义在

jdk/src/share/classes/java/net/Inet4AddressImpl.java

String getHostByAddr(byte[] addr) throws UnknownHostException;

继续研究getHostByAddr,对应的实现位于
jdk/src/solaris/native/java/net/Inet6AddressImpl.c:

/*

* Class: java_net_Inet6AddressImpl

* Method: getHostByAddr

* Signature: (I)Ljava/lang/String;

*/

JNIEXPORT jstring JNICALL

Java_java_net_Inet6AddressImpl_getHostByAddr(JNIEnv *env, jobject this,jbyteArray addrArray) {

jstring ret = NULL;

#ifdef AF_INET6

char host[NI_MAXHOST+1];

int error = 0;

int len = 0;

jbyte caddr[16];

if (NET_addrtransAvailable()) {

struct sockaddr_in him4;

struct sockaddr_in6 him6;

struct sockaddr *sa;

/*

* For IPv4 addresses construct a sockaddr_in structure.

*/

if ((*env)->GetArrayLength(env, addrArray) == 4) {

jint addr;

(*env)->GetByteArrayRegion(env, addrArray, 0, 4, caddr);

addr = ((caddr[0]<<24) & 0xff000000);

addr |= ((caddr[1] <<16) & 0xff0000);

addr |= ((caddr[2] <<8) & 0xff00);

addr |= (caddr[3] & 0xff);

memset((void *) &him4, 0, sizeof(him4));

him4.sin_addr.s_addr = (uint32_t) htonl(addr);

him4.sin_family = AF_INET;

sa = (struct sockaddr *) &him4;

len = sizeof(him4);

} else {

/*

* For IPv6 address construct a sockaddr_in6 structure.

*/

(*env)->GetByteArrayRegion(env, addrArray, 0, 16, caddr);

memset((void *) &him6, 0, sizeof(him6));

memcpy((void *)&(him6.sin6_addr), caddr, sizeof(struct in6_addr) );

him6.sin6_family = AF_INET6;

sa = (struct sockaddr *) &him6 ;

len = sizeof(him6) ;

}

error = (*getnameinfo_ptr)(sa, len, host, NI_MAXHOST, NULL, 0,

NI_NAMEREQD);

if (!error) {

ret = (*env)->NewStringUTF(env, host);

}

}

#endif /* AF_INET6 */

if (ret == NULL) {

JNU_ThrowByName(env, JNU_JAVANETPKG "UnknownHostException", NULL);

}

return ret;

}

getnameinfo_ptr的定义位于
jdk/src/solaris/native/java/net/net_util_md.c:

getnameinfo_ptr = (getnameinfo_f)

JVM_FindLibraryEntry(RTLD_DEFAULT, "getnameinfo");

实际是调用了glibc库函数,man一下getnameinfo

DESCRIPTION

The getnameinfo() function is the inverse of getaddrinfo(3): it converts a socket address to a corresponding host and service,

in a protocol-independent manner. It combines the functionality of gethostbyaddr(3) and getservbyport(3), but unlike those

functions, getaddrinfo(3) is reentrant and allows programs to eliminate IPv4-versus-IPv6 dependencies.

The sa argument is a pointer to a generic socket address structure (of type sockaddr_in or sockaddr_in6) of size salen that

holds the input IP address and port number. The arguments host and serv are pointers to caller-allocated buffers (of size

hostlen and servlen respectively) into which getnameinfo() places null-terminated strings containing the host and service names

respectively.

The caller can specify that no hostname (or no service name) is required by providing a NULL host (or serv) argument or a zero

hostlen (or servlen) argument. However, at least one of hostname or service name must be requested.

结合man page说明和调用的上下文可以推测出这个函数可以通过ip查host,但是是怎么查的呢?继续查找代码,首先要确定操作系统用的glibc版本:

随便在机器上编译一个c程序,使用ldd命令查看它的依赖库路径:

[root@localhost test]# ldd a.out

linux-vdso.so.1 => (0x00007fff595ff000)

libc.so.6 => /lib64/libc.so.6 (0x0000003e60e00000)

/lib64/ld-linux-x86-64.so.2 (0x0000003e60600000)

[root@localhost test]# /lib64/libc.so.6

GNU C Library stable release version 2.12, by Roland McGrath et al.

Copyright (C) 2010 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.

There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A

PARTICULAR PURPOSE.

Compiled by GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-4).

Compiled on a Linux 2.6.32 system on 2013-11-21.

Available extensions:

The C stubs add-on version 2.1.2.

crypt add-on version 2.1 by Michael Glad and others

GNU Libidn by Simon Josefsson

Native POSIX Threads Library by Ulrich Drepper et al

BIND-8.2.3-T5B

RT using linux kernel aio

libc ABIs: UNIQUE IFUNC

For bug reporting instructions, please see:

<http://www.gnu.org/software/libc/bugs.html>.

https://www.gnu.org/software/libc/download.html

去gnu官网下载对应版本的glibc源代码,查看源码,可以看出getnameinfo中调用了gethostbyaddr:

while (__gethostbyaddr_r ((const void *) &(((const struct sockaddr_in *)sa)->sin_addr),

sizeof(struct in_addr), AF_INET,

&th, tmpbuf, tmpbuflen,

&h, &herrno))

if (herrno == NETDB_INTERNAL && errno == ERANGE)

tmpbuf = extend_alloca (tmpbuf, tmpbuflen, 2 * tmpbuflen);

else

break;

}

在gethostbyaddr函数中有这么一段:

switch (af) {

case AF_INET:

(void) sprintf(qbuf, "%u.%u.%u.%u.in-addr.arpa",

(uaddr[3] & 0xff),

(uaddr[2] & 0xff),

(uaddr[1] & 0xff),

(uaddr[0] & 0xff));

break;

case AF_INET6:

qp = qbuf;

for (n = IN6ADDRSZ - 1; n >= 0; n--) {

qp += SPRINTF((qp, "%x.%x.",

uaddr[n] & 0xf,

(uaddr[n] >> 4) & 0xf));

}

strcpy(qp, "ip6.arpa");

break;

default:

abort();

}

这里把ip地址按8位翻转之后,加了一个“.in-addr.arpa”后缀,之后就通过通用的函数发出dns query请求,最终会调用res_mkquery,man一下这个函数(man 3 res_mkquery):

The res_mkquery() function constructs a query message in buf of length buflen for the domain name dname. The query type op is usually QUERY, but can be any of the types defined in <arpa/nameser.h>. newrr is currently unused.

http://linux.die.net/man/3/res_mkquery

跟dns请求相关的实现略复杂,这里不再展开。

这里可以走一个小捷径,我们写一个最简单的c程序来查看getnameinfo都大致做了什么事情:

[root@localhost test]# gcc test.c

#include

#include

int main() {

struct sockaddr_in ip;

const char *ipstr = "127.0.0.1";

int err;

char host[NI_MAXHOST+1];

if (!inet_aton(ipstr, &ip))

errx(1, "can't parse IP address %s", ipstr);

ip.sin_family = AF_INET;

printf("noop\n");

err = getnameinfo(&ip,sizeof(ip),host,NI_MAXHOST,NULL,0 ,NI_NAMEREQD);

printf("start\n");

err = getnameinfo(&ip,sizeof(ip),host,NI_MAXHOST,NULL,0 ,NI_NAMEREQD);

printf("end\n");

}

然后使用strace来跟踪系统调用:

[root@localhost test]# strace ./a.out

execve("./a.out", ["./a.out"], [/* 26 vars */]) = 0

brk(0) = 0x17d5000

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd7000

access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

open("/etc/ld.so.cache", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=59784, ...}) = 0

mmap(NULL, 59784, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff894dc8000

close(3) = 0

open("/lib64/libc.so.6", O_RDONLY) = 3

read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\356\341`>\0\0\0"..., 832) = 832

fstat(3, {st_mode=S_IFREG|0755, st_size=1926800, ...}) = 0

mmap(0x3e60e00000, 3750152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e60e00000

mprotect(0x3e60f8b000, 2093056, PROT_NONE) = 0

mmap(0x3e6118a000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18a000) = 0x3e6118a000

mmap(0x3e6118f000, 18696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e6118f000

close(3) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dc7000

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dc6000

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dc5000

arch_prctl(ARCH_SET_FS, 0x7ff894dc6700) = 0

mprotect(0x3e6118a000, 16384, PROT_READ) = 0

mprotect(0x3e6081f000, 4096, PROT_READ) = 0

munmap(0x7ff894dc8000, 59784) = 0

fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd6000

write(1, "noop\n", 5noop

) = 5

socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3

connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)

close(3) = 0

socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3

connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)

close(3) = 0

brk(0) = 0x17d5000

brk(0x17f6000) = 0x17f6000

open("/etc/nsswitch.conf", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=1688, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd5000

read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1688

read(3, "", 4096) = 0

close(3) = 0

munmap(0x7ff894dd5000, 4096) = 0

open("/etc/ld.so.cache", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=59784, ...}) = 0

mmap(NULL, 59784, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff894db6000

close(3) = 0

open("/lib64/libnss_files.so.2", O_RDONLY) = 3

read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360!\0\0\0\0\0\0"..., 832) = 832

fstat(3, {st_mode=S_IFREG|0755, st_size=65928, ...}) = 0

mmap(NULL, 2151824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff894ba8000

mprotect(0x7ff894bb4000, 2097152, PROT_NONE) = 0

mmap(0x7ff894db4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc000) = 0x7ff894db4000

close(3) = 0

mprotect(0x7ff894db4000, 4096, PROT_READ) = 0

munmap(0x7ff894db6000, 59784) = 0

getpid() = 28054

open("/etc/resolv.conf", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=50, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd5000

read(3, "nameserver 172.16.xx.xx\nnamese"..., 4096) = 50

read(3, "", 4096) = 0

close(3) = 0

munmap(0x7ff894dd5000, 4096) = 0

uname({sys="Linux", node="localhost.localdomain", ...}) = 0

open("/etc/host.conf", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd5000

read(3, "multi on\n", 4096) = 9

read(3, "", 4096) = 0

close(3) = 0

munmap(0x7ff894dd5000, 4096) = 0

open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3

fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)

fstat(3, {st_mode=S_IFREG|0644, st_size=400, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd5000

read(3, "127.0.0.1 localhost localhost."..., 4096) = 400

read(3, "", 4096) = 0

close(3) = 0

munmap(0x7ff894dd5000, 4096) = 0

open("/etc/ld.so.cache", O_RDONLY) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=59784, ...}) = 0

mmap(NULL, 59784, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff894db6000

close(3) = 0

open("/lib64/libnss_dns.so.2", O_RDONLY) = 3

read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\20\0\0\0\0\0\0"..., 832) = 832

fstat(3, {st_mode=S_IFREG|0755, st_size=27424, ...}) = 0

mmap(NULL, 2117880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff8949a2000

mprotect(0x7ff8949a7000, 2093056, PROT_NONE) = 0

mmap(0x7ff894ba6000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7ff894ba6000

close(3) = 0

open("/lib64/libresolv.so.2", O_RDONLY) = 3

read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\00009\240b>\0\0\0"..., 832) = 832

fstat(3, {st_mode=S_IFREG|0755, st_size=113952, ...}) = 0

mmap(0x3e62a00000, 2202248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e62a00000

mprotect(0x3e62a16000, 2097152, PROT_NONE) = 0

mmap(0x3e62c16000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x3e62c16000

mmap(0x3e62c18000, 6792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e62c18000

close(3) = 0

mprotect(0x3e62c16000, 4096, PROT_READ) = 0

mprotect(0x7ff894ba6000, 4096, PROT_READ) = 0

munmap(0x7ff894db6000, 59784) = 0

socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3

connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.xx.xx")}, 16) = 0

poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])

sendto(3, "\36\247\1\0\0\1\0\0\0\0\0\0\0010\0010\0010\0010\7in-addr\4arp"..., 38, MSG_NOSIGNAL, NULL, 0) = 38

poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])

ioctl(3, FIONREAD, [73]) = 0

recvfrom(3, "\36\247\205\203\0\1\0\0\0\1\0\0\0010\0010\0010\0010\7in-addr\4arp"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.xx.xx")}, [16]) = 73

close(3) = 0

write(1, "start\n", 6start

) = 6

open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=400, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff894dd5000

read(3, "127.0.0.1 localhost localhost."..., 4096) = 400

read(3, "", 4096) = 0

close(3) = 0

munmap(0x7ff894dd5000, 4096) = 0

socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3

connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.xx.xx")}, 16) = 0

poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])

sendto(3, "~\223\1\0\0\1\0\0\0\0\0\0\0010\0010\0010\0010\7in-addr\4arp"..., 38, MSG_NOSIGNAL, NULL, 0) = 38

poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])

ioctl(3, FIONREAD, [73]) = 0

recvfrom(3, "~\223\205\203\0\1\0\0\0\1\0\0\0010\0010\0010\0010\7in-addr\4arp"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.xx.xx")}, [16]) = 73

close(3) = 0

write(1, "end\n", 4end

) = 4

exit_group(4) = ?

可以看出,首次查询时会读取/etc/nsswitch.conf,/etc/resolv.conf,后面的请求先读了/etc/hosts,找不到就向dns服务器发送了一个udp查询请求(SOCK_DGRAM),之后使用了poll等待返回结果,有返回的话使用recvfrom接收结果。

为了验证看代码得到的结果再次抓包,不过这次只过滤53端口的数据包(dns服务的端口为53):

[root@79-40-151-yf-core logs]# tcpdump -i eth1 port 53

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

18:26:42.135322 IP 10.79.40.xx.35611 > 10.71.16.xx.domain: 59106+ PTR? xx.40.79.10.in-addr.arpa. (43)

18:26:42.135348 IP 10.79.40.xx.35611 > 172.16.xx.xx.domain: 59106+ PTR? xx.40.79.10.in-addr.arpa. (43)

18:26:42.136024 IP 172.16.xx.xx.domain > 10.79.40.xx.35611: 59106 NXDomain 0/0/0 (43)

18:26:42.136258 IP 10.79.40.xx.43068 > 10.71.16.xx.domain: 32990+ PTR? xx.16.71.10.in-addr.arpa. (43)

18:26:42.136276 IP 10.79.40.xx.43068 > 172.16.xx.xx.domain: 32990+ PTR? xx.16.71.10.in-addr.arpa. (43)

跟前面推测的一样,这里确实在向dns服务器发送这种后缀是.in-addr.arpa的请求。

可用在wiki上找到这类查询的详细描述:
https://en.wikipedia.org/wiki/Reverse_DNS_lookup

最后一个问题是这个查询会超时吗?超时时间是多少?根据man page结果,dns查询的超时是5秒:

$ man resolv.conf

...

timeout:n

sets the amount of time the resolver will wait for a response from a remote name server before retrying the query via a different name server. Measured

in seconds, the default is RES_TIMEOUT (currently 5, see ). The value for this option is silently capped to 30.

...

跟抓包结果一致,并且注意到strace中poll的最后一个参数是5000,和默认的超时时间一致。

我们可以在/etc/resolv.conf里增加关于超时时间的配置:

options timeout:1

再用strace试一下,果然poll的参数变成1000了。

但是线上机器配置了dnsmasq缓存,为什么缓存没有生效?

配置了dnsmasq后再次使用tcpdump,可用看到lo网卡和eth1网卡都有查询请求,由于反向dns查询不到主机名,dnsmasq无法缓存结果,只能每次都把请求转发给实际dns。

线上除了这个网段的机器还有其他机器,为什么其他机器没有问题?

没出问题的机器里/etc/hosts配置了本机ip对应的hostname,在hosts文件中查询到了就不会再去搜索dns。

DNS解析在那个时间为什么会消耗5秒?

由于udp协议本身传输不可靠没有重发的机制,在网络异常的时候只能默默的等待超时,具体网络的问题这里就不展开了。

如何解决这个问题?

首先第一反应是想到升级tomcat版本,查看新版tomcat代码,有问题的代码果然没有了,线上服务升级到tomcat8后也恢复了正常。

如果不能升级tomcat,可以在nginx的探测增加host header,避免前端机反向查询请求。

如果两者都不能做,那么可以在本机hosts中配置对应本机ip的hostname,可以避免通过dns服务器查询。

相关推荐

Python自动化脚本应用与示例(python自动化脚本教程)

Python是编写自动化脚本的绝佳选择,因其语法简洁、库丰富且跨平台兼容性强。以下是Python自动化脚本的常见应用场景及示例,帮助你快速上手:一、常见自动化场景文件与目录操作O批量重命名文件...

如何使用Python实现一个APP(如何用python做一个程序)

要使用Python实现一个APP,你可以选择使用一些流行的移动应用开发框架,如Kivy、PyQt或Tkinter。这里以Kivy为例,它是一个跨平台的Python框架,可以用于创建漂亮的图形用户界面(...

免费定时运行Python程序并存储输出文档的服务推荐

免费定时运行Python程序并存储输出文档的服务推荐以下是几种可以免费定时运行Python程序并存储输出结果的云服务方案:1.PythonAnywhere特点:提供免费的Python托管环境支持定时...

【Python程序开发系列】如何让python脚本一直在后台保持运行

这是我的第385篇原创文章。一、引言让Python脚本在后台持续运行,有几种常见的方式,具体方式可以根据你的系统环境和需求选择。二、Linux或macOS系统2.1使用nohup命令no...

运行和执行Python程序(运行python的程序)

一、Python是一种解释型的脚本编程语言,这样的编程语言一般支持两种代码运行方式:交互式编程在命令行窗口中直接输入代码,按下回车键就可以运行代码,并立即看到输出结果;执行完一行代码,你还可以继续...

Python 初学者指南:计算程序的运行时长

在编写Python程序时,了解程序的运行时长是一项很有用的技能。这不仅能帮助你评估代码的效率,还能在优化程序性能时提供关键的数据支持。对于初学者来说,计算程序运行时长其实并不复杂,接下来就让我们看...

pyest+appium实现APP自动化测试,思路全总结在这里

每天进步一点点,关注我们哦,每天分享测试技术文章本文章出自【码同学软件测试】码同学公众号:自动化软件测试码同学抖音号:小码哥聊软件测试01appium环境搭建安装nodejshttp://nodej...

血脉觉醒后,编程小白我是如何通过Deepseek和Trae轻松开发软件的

以下就是作为一个编程小白的我,是如何一步步开发软件的保姆级教程,请点赞收藏:第一步:打开#deepseek#(首先关闭深度思考和联网搜索)输入或复制你要让它做一个什么样软件的要求和提示词(你可以先用...

我用Deepseek+Trae写的python小软件,小白也能轻松用上模型啦!

利用AI大模型deepseek,搭配TraeCN,用半个小时做了一个本地Ollama安装部署和一键卸载的小工具,哈哈哈!感觉还不错#deepseek#一直想做一个本地Ollama安装部署和一键卸载...

在安卓设备上运行Python的方法(安卓能运行python吗)

技术背景在安卓设备上运行Python可以为开发者提供更多的开发选择和灵活性,能够利用Python丰富的库和简洁的语法来开发各种应用,如游戏、脚本工具等。然而,由于安卓系统原生不支持Python,需要借...

零基础小白,DeepSeek全自动编程,超详细提示词,一键生成软件!

我前面发表了文章,详细说了编程零基础小白,如何利用DeepSeek进行编程的全过程,感兴趣的可以去看看:DeepSeek全自动编程很多人不会写提示词,不知道怎么开始对话。话不多说,请先看下图中的对话,...

小白用DeepSeek+Python编写软件(用python制作软件)

周末无事,用DeepSeek生成全部代码,写了一个mp3音乐播放器,几分钟搞定,DeepSeek确实太强大了。我的提示语是这么写的:“请用Python语言写一个音乐播放器,支持常见音乐格式,我是Pyt...

零基础使用DeepSeek开发Windows应用程序,超简单超实用!

你敢相信,我居然用DeepSeek开发了一个能用的Windows软件!整个过程就像和学霸同桌组队做作业,我负责提需求,DeepSeek负责写代码改bug,全程碰到任何问题直接丢给DeepSeek即可。...

第二篇:如何安装Python并运行你的第一个程序

欢迎回到我的Python入门教程系列!在上一篇中,我们讨论了为什么Python是一门值得学习的编程语言。今天,我们将迈出第一步:安装Python并运行你的第一个程序。无论你是Windows、macOS...

Python 运行,带你找入口,快速读懂程序

有C或Java编程开发经验的软件开发者,初次接触python程序,当你想快速读懂python项目工程时,是否觉得python程序有些太过随意,让你看有些无所适从,进而有些茫然。这是...

取消回复欢迎 发表评论: