The easiest way to obfuscate code is to remove white-spaces that are not necessary and to shorten the names of variables and functions. A couple of years ago a made this simple tool to parse such obfuscated JavaScript code.
An example how the code can look after obfuscation and before passing it to a more readable form is presented below.
(function(){var s=true,t=false,aa=window,u=undefined,v=Math,ba="push",fa="slice",ga="cookie",y="charAt",z="indexOf",A="gaGlobal",ha="getTime",ja="toString",B="window",D="length",E="document",F="split",G="location",ka="href",H="substring",I="join",L="toLowerCase";var la="_gat",ma="_gaq",na="4.8.6",oa="_gaUserPrefs",pa="ioo",M="&",N="=",O="__utma=",qa="__utmb=",ra="__utmc=",sa="__utmk=",ta="__utmv=",ua="__utmz=",va="__utmx=",wa="GASO=";var xa=function(){var j=this,h=[],k="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";j.uc=function(m){h[m]=s};j.Nb=function()
[...]
After parsing it to more friendly form, we can spot functions, variables, loops and other things. Although the code still is far from being beautiful, it looks much better.
(function()
{
var s = true,t = false,aa = window,u = undefined,v = Math,ba = "push",fa = "slice",ga = "cookie",y = "charAt",z = "indexOf",A = "gaGlobal",ha = "getTime",ja = "toString",B = "window",D = "length",E = "document",F = "split",G = "location",ka = "href",H = "substring",I = "join",L = "toLowerCase";
var la = "_gat",ma = "_gaq",na = "4.8.6",oa = "_gaUserPrefs",pa = "ioo",M = "&",N = " = ",O = "__utma = ",qa = "__utmb = ",ra = "__utmc = ",sa = "__utmk = ",ta = "__utmv = ",ua = "__utmz = ",va = "__utmx = ",wa = "GASO = ";
var xa = function()
{
var j = this,h = [],k = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
j.uc = function(m)
{
h[m] = s
}
;
j.Nb = function()
{
for(var m = [],i = 0;
i<h[D];
i++)if(h[i])m[v.floor(i/6)]^ = 1<<i%6;
for(i = 0;
i<m[D];
i++)m[i] = k[y](m[i]||0);
return m[I]("")
}
}
,ya = new xa;
function Q(j)
{
ya.uc(j)
}
;
[...]
We can make it more readable by parsing it with below Haskel script, it's behavior is documented in the comments.
import Data.List
main = do
let sourceFile = "google-analytics.com ga.js"
let outputFile = "parsed.js"
content <- readFile sourceFile
--- 1. newlines are in random, pointless places, remove all of them
let tmpContent = replace "\n" "" content
-- 2. add newline after ; sign
let content = replace ";" ";\n" tmpContent
-- 3. signs { and } should be alone in line
let tmpContent = replace "{" "\n{\n" content
let content = replace "}" "\n}\n" tmpContent
-- 4. add space after coma
let tmpContent = replace "," ", " content
-- 5. add spaces to = at the begining and at the end of signs
let tmpContent = replace "=" " = " content
-- add indentation + save to outupt
writeFile outputFile (addIndents tmpContent 0)
addIndents :: String -> Int -> String
addIndents "" _ = ""
addIndents body indentLevel =
head' ++ addIndents tail indentLevel' where
head:tail = body
indentLevel' = indentLevel + case head of
'{' -> 1
'}' -> -1
_ -> 0
head' = head:[] ++ (makeIndent indentLevel' (head=='\n'))
makeIndent 0 _ = ""
makeIndent _ False = ""
makeIndent i True = "\t" ++ (makeIndent (i - 1) True)
replace :: (Eq a) => [a] -> [a] -> [a] -> [a]
replace _ _ [] = []
replace old new xs@(y:ys) =
case stripPrefix old xs of
Nothing -> y : replace old new ys
Just ys' -> new ++ replace old new ys'
With this naive algorithm we can introduce minor errors to the code, but I think that it's still useful. Of course the results would be better if we would use grammar parser (like yacc+lex, or parsec), but it would be harder to implement, so I think that this solution has sufficient quality to effort ratio.
Te JavaScript code in the first example is horrible, I can't imagine why someone would make it like this (variable names, no whitespaces, etc).
ReplyDelete