return home

LSB Steganography in C (Guide)

Embedding secret data into regular files.

Preface

I thought hiding secret data in files is going to be a hard task, since I didn't knew anything about it, plus there was almost no resources on how to do so, but it was surprisingly easy and fast to learn (if we doesn't count this headache with bitwise operations you are going to see later) I was thinking about not writing this article at all on that theme after my question lost it's meaning for me, however I guess it might be a good idea to write a detailed guide and place it somewhere on the internet (here).

By the way, this article is not really for complete beginners in C, you will need to know some basic skills working with files and understanding pointers. But I assume that if you are visiting this website you already a chad.

What is "Least Significant Bit"?

Least Significant Bit is a technique used for hiding data inside another files by changing last bit of a byte, for example we have some jpeg picture and in binary it will look similar to this:

01100001
01100010
01100011
01100100
01100101
01100110
01100111
01101000

And we are going to change the last bit of each byte with our own data, it will not take a visible effect on program at all but we now have secret data there. For example lets insert number 9 into our image, music or whatever ( 9 in binary looks like 00111001 )

0110000 0
0110001 0
0110001 1
0110010 1
0110010 1
0110011 0
0110011 0
0110100 1

We changed last bit of each byte with our data, this technique, if done properly will not make any visible changes at all, and for every type of data this technique is different. That's when the most confusion I met when researching topic and I still can't explain and cover every file type in existence, so we are going to talk only about .wav and .png file formats, the most popular and yet the easiest to understand proper techniques.

Beside theory in this article, we are also going to cover little bit of C and raw bit manipulations so that you also can write your own tool for steganography.

.Wav Files

In LSB steganography its important to skip the headers of the file as it defines file itself, modifying it even slightly might result in damaging the file which we do not want. Standard header size in .wav files is 44 bytes and beside this we have no restrictions like compression and etc. which makes .wav format one of the simpliest formats, in my opinion, to embedd our data into. However, for example my video file had header little bit bigger, 48 bytes, so take in account that size may vary even though its still pretty rare as I know.

So we can simply do what i described, change every last bit of a byte after skipping the header to not damage the file and thats all, we will hide our secret. But when we are going to extract this secret back, we are going to get a lot of garbage data back after our secret message, file or whatever you want to hide. for example "Hellokjfdsoiewmpodifmwodimflkwem", its simple if we are hiding plain text since human can just skip all the noise data, but if we are hiding an archive this noise data will eventually damage it.

Solution for this problem is pretty simple, before hiding our archive we are going to include our sequence of characters - header for our data, and in this header we are also going to include size of our archive, file or whatever, so our program will cut all the garbage.
So we are going to start our project(finally) but how do we actually change last bit of a byte? Here we are going to explore little bit of bit manipulations in C. void embed_bit(uint8_t *file_byte, uint8_t bit) { *file_byte = (*file_byte & 0xFE) | bit; } Note: uint8_t is a data type holding 8-bit unsigned integer that store values from 0 to 255
Here in the code single & means bitwise AND operator, walks through whole bytes and compares each bit of *file_byte to our other 0xFE byte (11111110 in binary), writes 1 only if both bits are 1. As a result, all bits in *file_byte will remain unchanged, except for the last bit. The last bit is forced to 0 because both bits cannot be 1.
Example:
01001111 & 11111110= 01001110

And then after it we use bitwise | OR operator, which sets each bit to 1 if at least one of the bits is 1.
We are going to pass our bit as 00000001 or 00000000 later in the code so the OR operator is just going to change last bit to either 1 or 0 depending on our data.

Let's work on our embedding function, we are going to take in 3 arguments: name of the target file, payload file(can be anything), and name for outputing modified .wav file. Then we are going to open inputs for reading, and output file for writing. #define WAV_HEADER_SIZE 44 void embed_data_wav( const char *wav_in, const char *payload_file, const char *wav_out ) { FILE *fwav = fopen(wav_in, "r"); // target file FILE *fp = fopen(payload_file, "r"); // payload FILE *out = fopen(wav_out, "w"); // output file Now we are going to create array of bytes named "header", read it and copy to our output file. uint8_t header[WAV_HEADER_SIZE]; // array of header bytes fread(header, 1, WAV_HEADER_SIZE, fwav); // copy bytes into variable "header" from "fwav" fwrite(header, 1, WAV_HEADER_SIZE, out); // write "header" into "out" Here we determine file size by using fseek. fseek(fp, 0, SEEK_END); // go to the end of the file uint64_t payload_size = ftell(fp); // copy size fseek(fp, 0, SEEK_SET); // go back to the start of the file struct steg_header sh = { // Initiate our header size .magic = {'S','T','E','G'}, .size = payload_size }; uint64_t total_size = sizeof(sh) + payload_size; // get total size of header + payload uint8_t *buffer = malloc(total_size); memcpy(buffer, &sh, sizeof(sh)); // copy header into buffer fread(buffer + sizeof(sh), 1, payload_size, fp); // read payload into buffer + offset of header Now the most complicated part where we iterate through whole wav file, insert payload and output it all into new output.wav file uint64_t bit_index = 0; uint8_t file_byte; while (fread(&file_byte, 1, 1, fwav) == 1) { The while cycle we created reads one byte at a time from fwav into address of file_byte. fread function moves cursor with every while cycle by itself, so we just need to check if we still have data to read. Below what we do is check if bit_index is below our total_size to keep inserting payload, our total_size is counting bytes not bits so we multiply it by 8 for correction, and if the check is failed - it means we just write rest of the .wav file as it it. if (bit_index < total_size * 8) { ... } fwrite(&file_byte, 1, 1, out); } Now take a look on the inside of our if statement. uint8_t current_byte = buffer[bit_index / 8]; uint8_t bit = (current_byte >> (7 - (bit_index % 8))) & 1; embed_bit(&file_byte, bit); // function we wrote earlier bit_index++; The current_byte value is easy to understand, we just copy byte from our buffer and dividing it by 8 tells which byte we are working with. I should also note how C works with converting for example 5/8=0.625 to an integer is that C just drops this last .625 value and converts 5/8 to 0.

Okay, but how does this bit function works? It looks so complicated!!! (it isn't)

"bit_index % 8" - % is the modulo operator, so this gives the position of the bit within the current byte. Bytes have 8 bits (0–7). So if bit_index = 10, then bit_index % 8 = 2 means we are looking at the 3rd bit in this byte.

"7 - (bit_index % 8)" - Bits in a byte are usually numbered from left to right as 7 6 5 4 3 2 1 0

"current_byte >> (7 - (bit_index % 8))" - >> is the right-shift operator. It moves bits to the right. For example we have byte that looks like 10101100 if we shift it to the right by 5(10101100 >> 5) we will get 00000101

"& 1" - Makes sure byte looks either like 00000001 or 00000000.

And now... we finished embedding our ToP sEcReT data into our .wav file, congratulations! Now after the while cycle we are just going to clean up everything we did, clean the buffer and close the files. free(buffer); fclose(fwav); fclose(fp); fclose(out);

Extracting data from .wav

Extractin bits is much more easier task than embedding them. We are going to walk through it fast starting with our simple function which just cleans every byte untill the last one. uint8_t extract_bit(uint8_t file_byte) { return file_byte & 1; } And later in the code we are going to construct bytes bit by bit like that: payload[i / 8] <<= 1; payload[i / 8] |= extract_bit(file_byte); So we Now, again, open our .wav file with ToP sEcReT data and enter name where we are going to output our extracted payload. void extract_data_wav(const char *wav_in, const char *out_file) { FILE *fwav = fopen(wav_in, "r"); FILE *out = fopen(out_file, "w"); fseek(fwav, WAV_HEADER_SIZE, SEEK_SET); // skip .wav header Here I'm going to teach you a little malloc trick, we are going to allocate memory as usual but treat it as a file stream. What that means is that we are going to recreate our payload byte by byte inside allocated memory, and then later write this memory into a file.
But firstly we need to recreate a header. With struct we know what to expect so we just create a struct to allocate it, and then the "header" is a pointer to the start of our strucure, then we are just going to overwrite allocated structure with the bytes from .wav file. struct steg_header sh; uint8_t *header = (uint8_t *)&sh; // create a pointer to our header structure We are going to create a for loop to move bit by bit inside the allocated stucture, since sizeof displays size in bytes we are multyplying the value by 8. Then on each cycle we read next byte of a .wav file, if you remember: 1 byte of .wav file has only 1 bit of our ToP sEcReT dAtA. "header[i/8]" as I explained allows to make bit operations within the same byte thanks to how C convert floats to integers. On every cycle we move past written bits to the left, giving room for new bit. // extract header for (size_t i = 0; i < sizeof(sh) * 8; i++) { uint8_t byte; fread(&byte, 1, 1, fwav); // reads byte after byte on each cycle header[i / 8] <<= 1; // move past changed bits to the left header[i / 8] |= extract_bit(byte); // write extracted new bit into header } if (memcmp(sh.magic, "STEG", 4) != 0) { // check if the structure is even valid printf("No payload found\n"); exit(1); } And to extract payload we do same exact thing: allocate memory -> reconstruct bytes in that memory. uint8_t *payload = malloc(sh.size); memset(payload, 0, sh.size); for (uint64_t i = 0; i < sh.size * 8; i++) { uint8_t file_byte; fread(&file_byte, 1, 1, fwav); payload[i / 8] <<= 1; payload[i / 8] |= extract_bit(file_byte); } After everything is done we can write our buffer into payload output file, clean the payload buffer and close the files. fwrite(payload, 1, sh.size, out); free(payload); fclose(fwav); fclose(out); }

.Png Files

(part 2 is in development)

Related

Also feel free to checkout my similar content on my website!

example link