Many commercial steganographic programs use least significant bit (LSB) embedding techniques to hide data in 24-bit color images. We present the results from a new steganalysis algorithm that uses a variety of entropy and conditional entropy features of various image bitplanes to detect the presence of LSB hiding. Our technique uses a Support Vector Machine (SVM) for bivariate classification. We use the SVMLight implementation due to Joachims (available at http://svmlight.joachims.org/). A novel Genetic Algorithm (GA) approach was used to optimize the feature set used by the classifier. Results include correct identification rates as high as >98% and false positive rates as low as <2%. We have applied using the staganography programs stegHide and Hide4PGP. The hiding algorithms are capable of both sequential and distributed LSB embedding. The image library consisted of 40,000 digital images of varying size and content, which form a diverse test set. Training sets consisted of as many as 34,000 images, half "clean" and the other half a disjoint set containing embedded data. The hidden data consisted of files with various sizes and various information densities, ranging from very low average entropy (e.g., standard word processing or spreadsheet files) to very high entropy (compressed data). The testing phase used a similarly prepared set, disjoint from the training data. Our work includes comparisons with current state-of-the-art techniques, and a detailed study of how results depend on training set size and feature sets used.
KEYWORDS: Algorithm development, Algorithms, Data hiding, Software engineering, Data modeling, Computer security, Digital watermarking, Steganography, Standards development, Digital video discs
The inclusion of data hiding techniques in everything from consumer electronics to military systems is becoming more commonplace. This has resulted in a growing interest in benchmarks for embedding algorithms, which until now has focused primarily on the theoretical and product oriented aspects of algorithms (such as PSNR) rather than the factors that are often imposed by the system (e.g., size, execution speed, complexity). This paper takes an initial look at these latter issues through the application of some simple and well known software engineering metrics: McCabe Complexity and Halstead Software Measures. This paper illustrates an approach that applies these metrics to create a hypothetical, language-independent representation of an algorithm, identifying the encapsulated, measurable components that compose that algorithm. This is the first step in developing a representation that will not only allow for comparison between disparate algorithms, but describe and define algorithms in such a way as to remove language and platform dependency. Bringing these concepts to their logical conclusion highlights how such an approach would provide existing benchmarking systems a more in-depth and fair analysis of algorithms in the context of systems as a whole, and decrease variability which affects the accuracy of the theoretical and product measures used today.
Proceedings Volume Editor (3)
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.