2016 Second International Conference on Computational Intelligence & Communication Technology
Analysis of Different Text Steganography Techniques : A Survey 1
Shivani Sharma, 2 Dr. Avadhesh Gupta, 3 Munesh Chandra Trivedi, 4 Virendra Kumar Yadav 1, 3, 4
Department of Computer Science & Engineering ABES Engineering College, 2Institute of Management Studies 1, 2, 3, 4 Ghaziabad, India 1 2
[email protected],
[email protected],
[email protected],
[email protected] 1, 3, 4
While designing a secure steganographic system, following points were considered: (i) System is single, practical, stand alone should meet user requirement of confidentiality, authenticity and integrity (ii) User should be provided by a covert channel to hide secret communication, (iii) Based on the technical and physical requirements user should be able to have a balanced parameter selection option [3]. Before implementing any steganographic technique both sender and receiver must agreed on a mutual key exchange mechanism. There are several text steganography techniques, some of them are discussed in this paper.
Abstract—Steganography is helping individual to send confidential data between two parties. It enables user to hide data in different digital mediums. Steganography is of many types such as image steganography, text steganography, audio/video steganography etc. Text Steganography is quite difficult than other techniques because of less amount of redundancy and changes can be detected quite easily. Some of the techniques of text steganography has been discussed along with characteristics and working. Keywords—linguistic; steganography; cryptography; statistical; metamorphic.
I.
INTRODUCTION
B. Text Steganography Text steganography is broadly divided into following three categories [4] such as: (i) Format Based Methods - In this method text data is embedded in the cover text by changing the formatting of the cover text. This can be done by resizing the font size, inserting spaces between words, non-displayed characters (ii) Linguistic Methods: Linguistic analysis is done in this method (iii) Random and Statistical generation method: Comparison is not done with the known plain text and most stenographers generates their own cover texts.
Information is an important asset of mankind, whose security is an essential concern. Risk increases if working on real time systems which include banking system, railways, flights etc. Chances of attack increases when we transmit data via internet. Several types of attacks are possible such as eavesdropping, man in the middle attack, phishing attack, denial of service etc. So to secure our data, we are left with three main solutions which are by using a private dedicated channel, cryptography and steganography. Private dedicated is time consuming and user is restricted to a physical point. Cryptography moulds the message in some other form. Duo of cryptography and steganography can also be used which are known as metamorphic cryptography [1].
Text Steganography is the most difficult kind of steganography because a text file lacks a large scale redundancy of information in comparison to other digital medium like image, audio and video [5]. The structure of the text document remain same throughout i.e. text document file is transparent during saving, written and retrieval phase. While embedding data in a text file, the main concern is its structure, which should not change. If the structure is changed, whole meaning of the text file changes while in other digital mediums changes can be done easily without making any notable change in the concerned output. Many languages are used to hide data like Persian, Arabic [6], Hindi, English etc. English Language possess some characteristics such as, inflexion, use of periphrases and fixed word order. Inflexion means that with minimum change of shape, the relationship of words into a sentence can be indicated. Periphrases enable to express something in different ways. In fixed order, the relationship of a Ease of Use.
A. Steganography Steganography is an art of hiding data inside any digital medium like audio, image, video, text, protocol etc [2].Frequent terms used in steganography are: 1) Cover Object: Text, audio, video, image used for embedding data is known as cover object. It is also known as vessel object. 2) Secret data/message: The data which is to be embedded in a cover object is known as secret message. 3) Stego Object: It is the resultant output obtained after embedding which is known as stego object. 978-1-5090-0210-8/16 $31.00 © 2016 IEEE DOI 10.1109/CICT.2016.34
130
II.
RELATED WORK
3) Change Of Spelling In [13] author has used this method to embed secret data in a text file. They presented a method to exploit same words which are spelled differently in American and British English in order to hide secret message bits. The method which changes the format of the text can hide large amount of data. Table below shows some words that have different spellings in UK and US.
Text steganography methods are broadly classified into two groups: A) Changing the format of the text: The format of the text file is altered in this method. B) Changing the meaning of the text: The main focus of this method is to change the meaning of the text.
Table 3. Word Spelling method
There are limited methods based on changing the meaning of the text. So our main focus is to describe the working of changing the format of the text.Some of the methods are described below.
American English Airplane Fiscal Unalike
A. Changing the Format of the Text 1) Semantic Method
This method works by inserting spaces in a cover text file. The methodology of this technique is, if one space is inserted inside cover text then hidden bit is ‘0’, while two consecutive spaces represents ‘1’ at the end of the sentence or vice versa. White spaces can be inserted at the end of line, paragraph or between words or sentences. The inserted white space does not create any suspicion in the mind of steganalyst. There are some text editor programs which automatically delete extra space while doing formatting because of that hidden information is destroyed [8]. Inserting white spaces between HTML tags does not affect viewing the source or visibility of the web content. The drawback of this technique is due to insertion of spaces size of the text file increases and a little amount of data can be hidden.
Table 1. Semantic method
SYNONYM Idle Difficult Sad
5) Format Based Text Steganography Method Sangita Roy et. al [14] proposes a novel approach of format based text steganography by using the combination of two popular text steganography methods word shifting and line shifting methods along with copy protection technique with high capacity of the cover object. By using the above methods, this approach embeds data in binary form rather than character format. More than one bit is embedded in each line of cover text so this method has good hiding capacity and this method posse less distortion in the cover text. This method is hybrid as it fuses two methods (Line shifting and word shifting) and uses a special character for performing text steganography.
2) Text Abbreviation or Acronym Abbreviations and acronym are used for hiding data. The target word is replaced by its acronym like as soon as possible is replaced by ASAP etc. This technique is mostly used in SMS, social networking applications and sites. Mohammad Sirali-Shahreza and M.Hassan Shirali Shahreza from Iran have used this technique in [10]. Less data can be hidden in a file of several kilobytes [7][11] by using this method. This method can also be used to reduce the size of the secret data text file [12] and then steganography is applied by using some other method.
a) Encoding Procedure: Secret message and cover object is taken as input and then encoding is applied. The secret message bit is counted to check whether it is even or odd. If the no. of bits is odd then “0” is added to the left, otherwise no change. Then the secret message is divided into no. of blocks, each of 2 bits size. The no. of blocks is stored into an array. Embedding is done by finding next embedding position in a block. These four cases arise while embedding secret message:
Table 2. Abbreviation or acronym method
ACRONYM ID DOB ASAP
Aeroplane Financial Unlike
4) Open Spaces Or White Spaces
Semantic stands for meaning of something also known as synonym of a certain word. This method hides data by using synonym of a word. Synonym substitution may hide single bit or multiple bit of secret information. In case of retyping or OCR programs this method provides protection of information. Sometimes meaning of the text is altered by using this method [7][8][9]. M. Hassan Shirali-Shahreza [9] have used semantic method for embedding secret message in a text file.
WORD Lazy Hard Unhappy
British English
WORD Identification Date Of Birth As Soon As Possible
(i) If block[i] = ‘00’, then line shifting method is applied. Go to the end of line and shrink procedure is called, which will shrink the font size of the line.
131
also make sure that all the HTML tags are closed properly.There are sevral approaches to hide data like table Driven Approach, Lexicographic Approach etc.Table driven approach is applied to tags having two or more attributes. One attributes is used as the key and the other is secondary. A database is created for pairing key and secondary value. Lexicographic approach is more efficient than table driven approach as the latter can encode at most n/2 bits of information where n is the number of attributes associated with the tag.Lexicographic approach can hide n-1 bits of information which is almost twice as that of table driven approach.
(ii) If block[i] = ‘11’, go to the end of line and expand procedure is called, which will expand the font size of the line. (iii) If block[i] = ‘01’, embed ‘0’, use two spaces instead of one space between two words. (iv) If block[i] = ‘10’, embed ‘1’, use an extra space before any special character. If no special character is there, add one special character to embed ‘1’. Same procedure is applied on other block and the resultant is stego text. This method is robust and with respect to the existing algorithm, a large amount of data can be hidden. It requires maximum of four inter spaces. The requirement can be further reduced by using combination of ‘00’ and ‘11’ bits. This algorithm is best for centrally aligned messages, better in right and left justified and worst in justified.
Table 4. Lexicographical vs. Table Driven HTML Tags in different websites
b) Decoding Procedure: The stego text is stored in an array and scanning of stego text is done by using ORC software. Then all the spaces will be used for extraction purpose. Four cases arise while extraction: (i) If two spaces occur simultaneously, then extract ‘01’. (ii)If a special character is present after a space in the stego text, then extract ‘10’. (iii)If line size is less than standard line size, then extract ‘00’. (iv)If line size is greater than standard line size, then extract ‘11’. 7) XML Document
All the extracted bits are combined into an array. Resultant is the secret message. In this method secret text is compressed and then embedded by using proposed algorithm. This technique encodes ‘0’ by using single space and ‘1’ by using two spaces. This method is fully dependent on the format (structure) of the text and can be used for preventing illegal duplication and distribution of text especially electronic data. It has a major disadvantage as many word processing software remove spaces from the text file, which destroys secret message. This method can also be applied to hard copy documents. The time complexity of the proposed method is O (n2) whereas the time complexity existing algorithm is O (n).
XML is an acronym of Extensible Mark-up Language (XML).It is a platform independent language which is universal in nature. It is mainly used for storing, exchanging and transferring information electronically.XML documents are light weighted and can be used on internet and in messaging. Any user can manipulate the content of XML document. The prime concern in XML documents is security. It can be ensured by using different techniques which guarantees integrity and confidentiality.XML documents can be used as the cover medium for text steganography purpose[17]. When secret data is embedded in XML document it cannot be altered, traced back or intercepted back to the sender.XML documents follows a database like format. SGML (Standard Generalized Markup Language) and HTML (Hyper Text Markup Language) are also used to send information over internet .XML is a shortened version of SGML. XML enables transmission, validation, definition, interpretation of data between heterogeneous applications and computing platform.XML deals with providing framework for tagging structured data. XML provides flexible document definition and processing capabilities. One of the special feature of XML document is flexibility i.e. user can do formatting of data to be displayed on multiple devices and platforms. Performing steganography on XML documents is efficient as it has been used widely for exchanging data as well as it has been considered as a language of digital contents and web pages. The author in his paper [17] has discussed four methods of performing steganography on XML documents. The first
6) HTML Tags HTML Tags finetune their effects by using attributes which can be in any order.Steganography can be performed by using this ordering.In [15] author has used the idea of hiding secret message by using convention of these attributes.They had developed a text steganographic technique in HTML using attribute reordering. This reordering does not add or remove any content in the files.While using HTML files ceratin constraints should be taken care of so that the secret message should remain undetectable like size of the HTML file should not be modified and its display should not be effectd either in plain text formant or in web browser.Disadvantage of using HTML is lack of redundant bits.Size of the HTML file is directly proportional to the message size. The author has used HTMLTidy [16] as the HTML parser for implementing steganography [13] which is used for cleaning HTML files and
132
International Conference on Intelligent System and Knowledge Engineering. [11] M. Hassan Shirali-Shahreza, and Mohammad Shirali-Shahreza. 2007. Text Steganography in Chat. IEEE. [12] Shivani , Yadav.V , Batham.S ,“ A Novel Approach of Bulk Data Hiding using Text Steganography”,accepted in Elsevier ICRTC 2015 , in press. [13] Khan Farhan Rafat ,”Enhanced Text Steganography By Changing Word’s Spelling”, FIT’09, December 16–18, 2009, CIIT, 2009, ACM. [14] Sangita Roy, Manini Manasmita , “A Novel Approach to Format Based Text Steganography”, ICCCS’11 , February 12–14, 2011,ACM. [15] Sudeep Ghosh , StegHTML: A message hiding mechanism in HTML tags, December 10,2007,http://www.cs.virginia.edu/~skg5n/main.pdf. [16] D. Raggett. Htmltidy. In tidy.sourceforge.net, 2004. [17] Aasma Ghani Memon, Sumbul Khawaja and Asadullah Shah “Steganography: A New Horizon For Safe Communication Through Xml”, Journal of Theoretical and Applied Information Technology, 2005
technique hides data by inserting random characters in between XML tags and their values. This technique is known as Random Character Technique, insertion of random characters increases from 1 to n after each word of the tag. This process is repeated till full stop (.) is encountered. This process is then applied recursively to all tags in a XML document. Second technique performs shuffling of tags which occurs in a predetermined sequence. in this 1st tag is swapped with the last tag, 2nd tag with second last tag. Same procedure is repeated till all the tags are swapped. Position as well as value of the tag is swapped. Third procedure is known as Attribute Specified Shuffling of Tags which saves the order of tags in attributes before shuffling. Last technique is known as Reverse Character Technique in which sequence of characters in a tag is reversed, for example if the tag is ‘width’ then it will be reversed as ‘htdiw’. After reversing the tag its value is also been reversed. Procedure is repeated till full stop (.) is encountered.
- 2008 JATIT.
[18] V. K. Yadav, et al. “Zero Distortion Technique: An Approach to Image
Steganography on color images”. In Proc. International Conference on Information and Communication Technology for Competitive Strategies, ICTCS '14, November 14 – 16 pages 79-83 (Published by ICPS-ACM, Proceedings Volume ISBN No: 978-1-4503-3216-3). [19] V.K. Yadav, et al. “ICSECV: An Efficient Approach of Video Encryption”. In Proc. Contemporary Computing (IC3), 2014 Seventh International Conference, 7-9 Aug. 2014, Pages: 425 – 430. [20] V.K.Yadav, et al. “Zero Distortion Technique: An Approach to Image Steganography using Strength of Indexed Based Chaotic Sequence”. In SSCC-2014, symposium proceedings published by Springer in Communications in Computer and Information Science Series(CCIS), Volume 467, 2014, pp 407-416, ISSN: 1865:0929.
III. CONCLUSION Several research work is carried out in the area of text steganography. With the advancement of technology and tools available, it is now essential to develop the some steganography algorithm which can withstand against the attacks. ACKNOWLEDGMENT I would like to thanks Prof. Anuja Kumar Acharya, B.M. Mehtre, Prof Munesh Chandra Trivedi for the kind of support and discussions. REFERENCES [1]
Thomas Leontin Philjon. J, Venkateshvara Rao. N , “Metamorphic Cryptography -A Paradox between Cryptography and Steganography Using Dynamic Encryption”,IEEE 2011. [2] Westfeld A, J. Camenisch et al., “Steganography for Radio Amateurs— A DSSS Based Approach for Slow Scan Television”, Springer-Verlag Berlin Heidelberg, pp. 201-215. [3] Mohammad Shirali-Shahreza, “Text Steganography by Changing Words Spelling”, ISBN 978-89-5519-136-3, Feb. 17- 20, 2008, ICACT 2008. [4] Krista Bennett (2004). " Linguistic Steganography: Survey, Analysis, and Robustness Concerns for Hiding Information in Text". CERIAS TR 2004-13. [5] Shraddha Dulera, Devesh Jinwala and Aroop Dasgupta, “EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT STEGANOGRAPHY” , International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.6, November 2011 [6] M. H. Shirali-Shahreza, M. Shirali-Shahreza, “A new approach to persian/arabic text steganography,” Proc. 5th Int. Conf. Computer and Information Science, Washington, 2006, pp.310-315. [7] Khan Farhan Rafat,"Enhanced Text Steganography in SMS”,2008, IEEE. [8] Mohammad Shirali-Shahreza, and M. Hassan Shirali- Shahreza .2007. “Text Steganography in SMS”, International Conference on Convergence Information Technology. [9] M. Hassan Shirali-Shahreza, and Mohammad Shirali-Shahreza. 2008. “A New Synonym Text Steganography”. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 9780-7695-3278-3/08 © 2008 IEEE. [10] Mohammad Shirali-Shahreza, and Sajad Shirali-Shahreza,2008. “Steganography in Text Documents”, Proceedings of 2008, 3rd
133