About Obfuscation - RetroGuard Documentation

Java bytecode (*.class files) contains all of the information, apart from
comments, that is in Java source (*.java) files. Using a tool called a
decompiler a hostile competitor can easily reverse engineer your Java
classes. To counter this threat, it is possible to obfuscate your class
files before distributing your software.

The obfuscation process strips all unnecessary information from the
classes. This includes the line number tables, local variable names and source
file names used by debuggers. Also, class, interface, field and method
identifiers are renamed to render them meaningless. The Java virtual machine,
which runs your bytecode, does not care at all about these changes. However,
the decompiled version of these classes is extremely difficult to understand,
frustrating any attempt to reverse engineer your code. The changes that an
obfuscator makes to your Java classes are not reversible - there is no
automated way for a reverse engineer to recover the lost information about
your code.
An additional benefit to obfuscation is a substantial reduction in the size
of your Java classes, due to the removal of unnecessary information and the
replacement of large, human-readable identifiers with small machine generated
names. This size reduction leads to faster download times for your Java
applets, and the ability to pack more features into your midlets running
on small devices like cellphones and PDAs.
To determine which classes are to be obfuscated, most obfuscators start at
a single entry point (usually the 'main' method of an application, or the
'Applet'-derived class for an applet), and construct a tree of all classes
accessible from that point. Unfortunately, this method is quite limiting and
works only in simple cases. If your Java code has multiple entry points
(several applications, applets, or JavaBeans, or if your code is intended to
be used as a Java library) then this method is just not flexible enough.
Instead, RetroGuard obfuscates all classes and interfaces within a JAR
file. JAR files are the industry standard mechanism for packaging Java classes
for distribution - it is easy to package your classes as a jar using the 'jar'
utility distributed with the Java Development Kit from Sun Microsystems. Any
number of entry points to the JAR can be specified using a RetroGuard script
file. This allows the obfuscation process to be completely flexible.
A technique used by several obfuscators is to introduce corrupt bytecode
into the obfuscated Java classes. These corruptions are prohibited by the
definitive text, the Java Virtual Machine Specification by Yellin and
Lindholm, but do not happen to be noticed by the current virtual machine
implementations. The corruptions are sufficient to break some of the simpler
decompilers on the market. This class corruption is a very dangerous course
to take, however, since virtual machines will certainly enforce the
constraints of the Specification much more strictly in the future. At
that point, code which uses this 'corrupting obfuscation' will simply fail.
Note that, as of Java SE 6, class file corruptions are disallowed by the latest virtual machine. From Sun's compatibility notes for Java SE 6:
"Some early bytecode obfuscators produced class files that violated the class file format as given in the virtual machine specification. Such improperly formatted class files will not run on the JDK virtual machine, though some of them could have run on earlier versions of the virtual machine. To remedy this problem, regenerate the class files with a newer obfuscator that produces properly formatted class files."
Corruption of classes is unacceptable -
one cannot afford to ship Java bytecode which only sometimes runs, or fails
completely on some virtual machines. For this reason the RetroGuard obfuscator
produces only verifiable bytecode in full compliance with the Java Virtual
Machine Specification. Instead of corrupting the bytecode, RetroGuard uses
heavy overloading of identifiers (multiple uses of method names within a
class) and the introduction of Java source-code keywords as identifiers to
make it almost impossible to understand decompiled Java classes.
Another technique that is often suggested to prevent decompilation is encryption of Java classes and the use of a custom classloader to decrypt them. However, since the decrypted classes can always be intercepted using a modified version of the 'java.lang.ClassLoader' method 'defineClass', that technique is fundamentally flawed. The issue is explained very clearly in Vladimir Roubtsov's article at JavaWorld.
|