## Introduction
What would be the first task in a digital forensic investigation? in the subsidiary concept, there will be data preservation and issuing a warrant. But what comes into our (digital forensic investigators) mind will be imaging of a Electronic Stored Informations (ESI). Which is the acquisition process.
Making a image of an evidence is an act of duplicating original data to keep its integrity. Who investigates with a original file? I can firmly say nobody. From this we can infer the importance of integrity. Then what comes next is how to make a forensic copy of a device or an information? Through `ctrl-c`ing and `ctrl-v`ing (Copy and Pasting) the targeted drive. It would be inadequate. Since you have read my other article which is about [[Timestamps while copying files]] the conclusion was that timestamp changes when we copy and paste from the original material. So it makes no worth copying it. With these reason we copy the targeted material in the low-level not in the file system level.
Even when we duplicate it in the way mentioned beforehand, there needs a special file format that could process the data in the investigation step. There are many known file types made by acquidiction tools such as `E01`,`dd` ,`AD1`.
## DD file format
### Definition & Characteristics
What should I say for this? It is obvious that `DD` would be the most common file type when we go through investigation. It’s the raw format. Raw literally means that it is a exact copy of original data without anything processed.
Identical copy means identical size. Which is pretty obvious but its the main point of this phrase.
If you ever tried imaging a material or practiced with your own disk might have recognized that there were occurrences alerting you have exceeded your disk capacity. ( OH don’t say i’m the only one who saw the message ). Simple to understand, it is impossible to save the exact amount of size of the disk in a disk!
This rationalizes the reason many other evidence file formats support compression options with theirs.
## E01 file format
### Definition & Characteristics
It’s the `E01` file. which is acronym for an Encase Image File format, which is a file that keeps the evidence. This file format is well known due to its reputation that Encase has. Since Encase was chosen in the court to have its legitimate admissibility there was a boooom using Encase in the digital forensic field. For now the reputation has decreased since main company that makes encase software has changed and other excellent softwares were announced in the market, it still has powerful market shares.
To be in track again, the main feature of E01 file is the processing of the evidence in to its own structure to process, analyze and preserve the evidence file. Other evidence file share these traits but since `E01` is the most popular file format today we will handle only `E01`.
First you may know is that `E01` file should not be called as `E01` file. To be accurate, it should be `E0#N` file. Like any other files that supports compression, `E01` file supports partitioned compression. Because digital forensic acquisitions handle large sized files in high chance. So when a file is partitioned, the file extension would be `E01`,`E02`,…. And make sure that every file is interconnected so if you give E01 as the input, it will take other files also if the software supports E01 file formats.
Inside the `E01` file there are many other values such as hash values to check its integrities and compression configurations since there could be options compressing the data. It would be helpful to know the exact structure of E01 file, but it is more productive agonizing how to utilize it. I’m just saying that we will not look deep into its structures.
### Structure
#### Case Info
When you go through various digital forensic tools that handles a whole case, there are inputs needed such as manager off investigator, case number, case type, etc. This is for the Case Information inserted in the header of E01 file. as briefly introduced before there are informations enumerated below.
- Case Investigator’s name
- Case Name
- Description of the Evidence
- Date
- Software version
- Operating System information
#### Hash
I would definitely assert that integrity is our first objective. By securing integrity the fact that the given evidence is not compromised intentionally or not will be proved.
There are 2 hashes used in E01 file. Which is CRC and MD5
CRC is a Cyclic Redundancy Check to verify an error in data and it is structured to verify its data by every 32KB of data. So the entire structure of E01 file would be data block and CRC checksum value paired continuously.
Also the hash that comes in the end of file is MD5 hash, which is to check the entire data is in the correct position.
> [!question] My hash value has been corrupted!!!
>
> When you calculate your hash value, there might be an experience when the hash value of original file differs with the hash calculated with E01 file.
>
> The MD5 hash value inserted in the E01 file is the hash for original data that excludes CRC values. So it would be your mistake thinking you calculated your hash correctly if you get your hash through the whole E01 file.
>
#### Data Block
As mentioned in the section about hash, The entire data is splitted by 32KB sized block.
### Investigation Method
Usually investigators use each softwares they prefer while using E01 file or raw image.
The main purpose of this section is to help people who wants to build their own software using E01 formats not for ones who haven’t gone through their software manuals.
For some simple tools and scripts, we are fond with extracting files form an image and giving them as the input. There are absolutely no problem dealing it that way. But some efforts has to be made by the investigator, so there will be different quality for each investigator.
Therefore developer of digital forensic tool should make their own software keep quality of output using inputs that uses least user interactions.
There are many programming languages such as C, Python, C++, but we will handle this issue with Python. It’s because its easy to use as a script and there are many imports available.
The main imports that we are using to read and traverse E01 file is `pyewf` and `pytsk3`.
Each imports will be installed via commands.
```bash
pip install libewf-python
pip install pytsk3
```
Let’s look at the sample code.
The improvement of the code is in your hand.
### Sample
```python
class ewf_Img_Info(pytsk3.Img_Info):
def __init__(self, ewf_handle):
self._ewf_handle = ewf_handle
super(ewf_Img_Info, self).__init__(url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)
def close(self):
self._ewf_handle.close()
def read(self, offset, size):
self._ewf_handle.seek(offset)
return self._ewf_handle.read(size)
def get_size(self):
return self._ewf_handle.get_media_size()
```
This is the class that reads E01 file.
```Python
class E01_handler:
def __init__(self,imgpath):
self.img = imgpath
self.fsobj,self.raw_handle = self.read_imagefile(imgpath,self.getfileType(imgpath))
def getfileType(self,filepath):
if filepath.split(".")[-1] == 'E01': return "E01"
else : return "raw"
def read_imagefile(self,imgpath,imgtype):
if imgtype == "E01":
filenames = pyewf.glob(imgpath)
fshandle = pyewf.handle()
fshandle.open(filenames)
img_Info = ewf_Img_Info(fshandle)
else:
img_Info = pytsk3.Img_Info(imgpath)
try:#MULTIPARTITION IMAGE
partitionTable = pytsk3.Volume_Info(img_Info)
for partition in partitionTable:
if b'Basic data partition' in partition.desc:
fileSystemObject = pytsk3.FS_Info(img_Info, offset = (partition.start * 512))
except IOError:#SINGLE PARTITION IMAGE
fileSystemObject = pytsk3.FS_Info(img_Info, offset = 0)
return fileSystemObject,img_Info
def readFile(self,filepath):
f = self.fsobj.open(filepath)
offset=0
size=f.info.meta.size
data = b""
while offset<size:
max_size=min(1024*1024,size-offset)
buf = f.read_random(0,max_size)
if not buf:
break
data += buf
offset+=len(buf)
return data
def listdir(self,filepath):
filelist = self.fsobj.open(filepath).as_directory()
res = []
for file_ in filelist:
if file_.info.name.name in [b'.',b'..']:
continue
res.append(file_.info.name.name)
return res
```
First I implemented the `read_imagefile` function that reads image file which isn’t that complicated. by checking the file type using its extension, then reads partition table using `python`. Read partitions will be used as an file system object later on.
This object gives access to files using absolute paths.
Basic functions such as directory listing function and file reading function was made using filesystem object.
You can also handle raw files using similar approaches.
Only with this level will make you implement software that parses artifact or gather data from a program.
## Conclusion
Today we looked through evidence files. Two types were explained, First the E01 1file and Raw file as the second. These file will be the most seen file extensions while you investigate a digital forensic case.
And we also verified the credibility of evidence file using CRC and MD5 hash while learning about its structure.
There were section not only to analyze through implemented software but to parse your own python program using E01 file or Raw file. Hope that many investigators gain their ability to pre-process their data extracted directly from evidence files without any other complicated stages.