Deobfuscation JavaScript code - how to start?

The easiest way to obfuscate code is to remove white-spaces that are not necessary and to shorten the names of variables and functions. A couple of years ago a made this simple tool to parse such obfuscated JavaScript code.

An example how the code can look after obfuscation and before passing it to a more readable form is presented below.

(function(){var s=true,t=false,aa=window,u=undefined,v=Math,ba="push",fa="slice",ga="cookie",y="charAt",z="indexOf",A="gaGlobal",ha="getTime",ja="toString",B="window",D="length",E="document",F="split",G="location",ka="href",H="substring",I="join",L="toLowerCase";var la="_gat",ma="_gaq",na="4.8.6",oa="_gaUserPrefs",pa="ioo",M="&",N="=",O="__utma=",qa="__utmb=",ra="__utmc=",sa="__utmk=",ta="__utmv=",ua="__utmz=",va="__utmx=",wa="GASO=";var xa=function(){var j=this,h=[],k="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";j.uc=function(m){h[m]=s};j.Nb=function()

[...]

After parsing it to more friendly form, we can spot functions, variables, loops and other things. Although the code still is far from being beautiful, it looks much better.

(function()
{
 var s = true,t = false,aa = window,u = undefined,v = Math,ba = "push",fa = "slice",ga = "cookie",y = "charAt",z = "indexOf",A = "gaGlobal",ha = "getTime",ja = "toString",B = "window",D = "length",E = "document",F = "split",G = "location",ka = "href",H = "substring",I = "join",L = "toLowerCase";
 var la = "_gat",ma = "_gaq",na = "4.8.6",oa = "_gaUserPrefs",pa = "ioo",M = "&",N = " = ",O = "__utma = ",qa = "__utmb = ",ra = "__utmc = ",sa = "__utmk = ",ta = "__utmv = ",ua = "__utmz = ",va = "__utmx = ",wa = "GASO = ";
 var xa = function()
 {
  var j = this,h = [],k = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
  j.uc = function(m)
  {
   h[m] = s
   }
  ;
  j.Nb = function()
  {
   for(var m = [],i = 0;
   i<h[D];
   i++)if(h[i])m[v.floor(i/6)]^ = 1<<i%6;
   for(i = 0;
   i<m[D];
   i++)m[i] = k[y](m[i]||0);
   return m[I]("")
   }
  
  }
 ,ya = new xa;
 function Q(j)
 {
  ya.uc(j)
  }
 ;
       [...] 

We can make it more readable by parsing it with below Haskel script, it's behavior is documented in the comments.

import Data.List

main = do
    let sourceFile = "google-analytics.com ga.js"
    let outputFile = "parsed.js"
    content <- readFile sourceFile
    --- 1. newlines are in random, pointless places, remove all of them
    let tmpContent = replace "\n" "" content
    -- 2. add newline after ; sign
    let content = replace ";" ";\n" tmpContent
    -- 3. signs { and } should be alone in line
    let tmpContent = replace "{" "\n{\n" content
    let content = replace "}" "\n}\n" tmpContent
    -- 4. add space after coma
    let tmpContent = replace "," ", " content
    -- 5. add spaces to = at the begining and at the end of signs
    let tmpContent = replace "=" " = " content
    -- add indentation + save to outupt
    writeFile outputFile (addIndents tmpContent 0)

addIndents :: String -> Int -> String
addIndents "" _ = ""
addIndents body indentLevel =
    head' ++ addIndents tail indentLevel' where 
        head:tail = body
        indentLevel' = indentLevel + case head of 
                                            '{' -> 1
                                            '}' -> -1
                                            _   -> 0
        head' = head:[] ++ (makeIndent indentLevel' (head=='\n'))
        

makeIndent 0 _ = ""
makeIndent _ False = ""
makeIndent i True = "\t" ++ (makeIndent (i - 1) True)

replace :: (Eq a) => [a] -> [a] -> [a] -> [a] 
replace _ _ [] = [] 
replace old new xs@(y:ys) = 
    case stripPrefix old xs of 
        Nothing -> y : replace old new ys 
        Just ys' -> new ++ replace old new ys'

With this naive algorithm we can introduce minor errors to the code, but I think that it's still useful. Of course the results would be better if we would use grammar parser (like yacc+lex, or parsec), but it would be harder to implement, so I think that this solution has sufficient quality to effort ratio.

1 comment:

  1. Te JavaScript code in the first example is horrible, I can't imagine why someone would make it like this (variable names, no whitespaces, etc).

    ReplyDelete