The easiest way to obfuscate code is to remove white-spaces that are not necessary and to shorten the names of variables and functions. A couple of years ago a made this simple tool to parse such obfuscated JavaScript code.
An example how the code can look after obfuscation and before passing it to a more readable form is presented below.
(function(){var s=true,t=false,aa=window,u=undefined,v=Math,ba="push",fa="slice",ga="cookie",y="charAt",z="indexOf",A="gaGlobal",ha="getTime",ja="toString",B="window",D="length",E="document",F="split",G="location",ka="href",H="substring",I="join",L="toLowerCase";var la="_gat",ma="_gaq",na="4.8.6",oa="_gaUserPrefs",pa="ioo",M="&",N="=",O="__utma=",qa="__utmb=",ra="__utmc=",sa="__utmk=",ta="__utmv=",ua="__utmz=",va="__utmx=",wa="GASO=";var xa=function(){var j=this,h=[],k="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";j.uc=function(m){h[m]=s};j.Nb=function() [...]
After parsing it to more friendly form, we can spot functions, variables, loops and other things. Although the code still is far from being beautiful, it looks much better.
(function() { var s = true,t = false,aa = window,u = undefined,v = Math,ba = "push",fa = "slice",ga = "cookie",y = "charAt",z = "indexOf",A = "gaGlobal",ha = "getTime",ja = "toString",B = "window",D = "length",E = "document",F = "split",G = "location",ka = "href",H = "substring",I = "join",L = "toLowerCase"; var la = "_gat",ma = "_gaq",na = "4.8.6",oa = "_gaUserPrefs",pa = "ioo",M = "&",N = " = ",O = "__utma = ",qa = "__utmb = ",ra = "__utmc = ",sa = "__utmk = ",ta = "__utmv = ",ua = "__utmz = ",va = "__utmx = ",wa = "GASO = "; var xa = function() { var j = this,h = [],k = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"; j.uc = function(m) { h[m] = s } ; j.Nb = function() { for(var m = [],i = 0; i<h[D]; i++)if(h[i])m[v.floor(i/6)]^ = 1<<i%6; for(i = 0; i<m[D]; i++)m[i] = k[y](m[i]||0); return m[I]("") } } ,ya = new xa; function Q(j) { ya.uc(j) } ; [...]
We can make it more readable by parsing it with below Haskel script, it's behavior is documented in the comments.
import Data.List main = do let sourceFile = "google-analytics.com ga.js" let outputFile = "parsed.js" content <- readFile sourceFile --- 1. newlines are in random, pointless places, remove all of them let tmpContent = replace "\n" "" content -- 2. add newline after ; sign let content = replace ";" ";\n" tmpContent -- 3. signs { and } should be alone in line let tmpContent = replace "{" "\n{\n" content let content = replace "}" "\n}\n" tmpContent -- 4. add space after coma let tmpContent = replace "," ", " content -- 5. add spaces to = at the begining and at the end of signs let tmpContent = replace "=" " = " content -- add indentation + save to outupt writeFile outputFile (addIndents tmpContent 0) addIndents :: String -> Int -> String addIndents "" _ = "" addIndents body indentLevel = head' ++ addIndents tail indentLevel' where head:tail = body indentLevel' = indentLevel + case head of '{' -> 1 '}' -> -1 _ -> 0 head' = head:[] ++ (makeIndent indentLevel' (head=='\n')) makeIndent 0 _ = "" makeIndent _ False = "" makeIndent i True = "\t" ++ (makeIndent (i - 1) True) replace :: (Eq a) => [a] -> [a] -> [a] -> [a] replace _ _ [] = [] replace old new xs@(y:ys) = case stripPrefix old xs of Nothing -> y : replace old new ys Just ys' -> new ++ replace old new ys'
With this naive algorithm we can introduce minor errors to the code, but I think that it's still useful. Of course the results would be better if we would use grammar parser (like yacc+lex, or parsec), but it would be harder to implement, so I think that this solution has sufficient quality to effort ratio.
Te JavaScript code in the first example is horrible, I can't imagine why someone would make it like this (variable names, no whitespaces, etc).
ReplyDelete