What makes forensics a science?

What makes forensics a science?

Throughout this class, we will present many techniques for recovering forensic evidence from computer systems. The skills you will learn can be applied in many different scenarios. For example, recovery of erased data is useful simply when data is deleted accidentally and there does not need to be a crime involved.

Under what conditions is the practice of forensics a science rather than a series of related techniques that recover data? When the investigator follows a repeatable, structured process for gathering evidence and uses strong inductive reasoning to reach conclusions, as we explain below.

Specifically, a scientific forensic investigator makes three critical steps in investigations.

  1. The process begins when an investigator has judged that an alleged crime or other event is worth investigating.
  2. Next, the investigator gathers evidence.
  3. Finally, a hypothesis is supported that best explains the events that took place.

The first step is a dictated by the resources available to the investigator and the law’s definition of crime, civil law, or a company’s internal policy. For example, there are more crimes and criminals than law enforcement can handle and investigations often have to be prioritized. Additionally, an investigator may not act if the event is not within their authority (determined by law or policy) or not within their responsibility (perhaps responsibilities are divided). And finally, the initial evidence may not meet a standard of proof that an investigation is warranted or authorized.

Evidence gathered in Step 2 has its origin in the transfer of artifacts at a crime scene that is described by Locard’s Exchange Principle. How evidence is gathered will greatly affect the validity of the results; however, this is a topic we leave for later. The investigator attempts to refine artifacts into evidence in several stages: identification: determining an artifact’s class characteristics; individualization: narrowing the class to one; association: linking a person with a crime scene through the individualized evidence; intentionality: inferring the intent of the person. Each stage is challenging: for example, not all artifacts may be found, and intent is often hard to discern.

In the last step, the investigator finds that a hypothesis can be supported by the evidence that answers questions specific to forensics: what crime, who did it, what was their intent, what were their specific actions? The type of reasoning employed by the investigators to answer these questions determines whether forensics is a science. In general, there are three options, but only the last option provides a scientific argument.

I found this great write up by chance that is better than anything I’ve ever written: https://www.merriam-webster.com/words-at-play/deduction-vs-induction-vs-abduction

Abductive Reasoning: the investigator reasons about the crime based on the most likely explanation. The investigator observes fact A and then determines an event B; the investigator is saying that A entails B. For A to entail B means that A cannot be true without B also being true.

For our case of Anne Adams: I observe that the document properties contain Acme’s name, and it is likely that such information was filled in by Microsoft Word automatically when the document was created at Acme; therefore, this document was created while Anne was at Acme.

A generous way of describing abductive reasoning is to say that it is an application of Occam’s Razor, which states that “Unless there is reason to believe otherwise, the simplest solution is the best.” You are saying “B is the most likely reason that I’ve observed A, because from what I know of the world, that’s the simplest explanation.” It’s not a strong type of reasoning. It’s assertion based on vague experience; it’s using your gut and therefore easily affected by bias. A more accurate description is to say that abductive reasoning is unsound: we cannot (or at least, should not) assume ahead of time the point we trying to prove. In reality, an investigation’s hypothesis often starts this way, but it is not a line of reasoning that is worth testifying over. A sound forensic method shares much in common with the scientific method in that observations confirm or refute a hypothesis that explains events.

Deductive Reasoning: the investigator reasons about the crime by constructing truths based on axiomatic assumptions. A deductive investigator assumes a general truth A, and derives B as a consequence. Deductive Reasoning is stronger than abductive because you start with what you consider to be “generally accepted facts” (A) and then determine that what you observe is true (B). For example:

Deductive reasoning

In deductive, you start with what you know to be true to deduce what you observe. It’s stronger reasoning for sure. You can also start with what you assume to be true:

I assume that Microsoft Word always automatically fills in a document’s properties with the author’s personal information; therefore, we can deduce that this document was created while Adams was at Acme.

The problem with deductive reasoning based on assumptions is that we are at the mercy of our assumptions: if they are wrong, our conclusions may be wrong as well.

Inductive Reasoning: the investigator reasons about the crime by what is observed to be true independent of the case. Here, we assume it is true that B follows from A because we have perceived it to be the case repeatedly. Here the investigator uses many observations to determine a probability (ideally a strong probability) that the inference is true.

In our scenario, an inductive investigator makes the claim:

From my repeated experience, I hypothesize that Microsoft Word always automatically fills in a document’s properties with the author’s personal information; therefore, we can infer that this document is a instance of my hypothesis and was created while Adams was at Acme.

The limitation of inductive reasoning is that we are at the mercy of our observations: if they are not an accurate sample of all possible outcomes, then we risk inferring the wrong conclusion. The good news is that scientists know how to perform careful observations and to draw conclusions appropriately. It is important to note, however, that just because we have used inductive reasoning does not mean we are correct: the above hypothesis ignores the possibility that Adams used a version of program installed by Acme but after she left the company. Moreover, a third party can easily change the information. (Later, we’ll build on inductive reasoning to make it stronger by adding things such as repeated experimentation, known error rates, standard methods, and peer review. )

Which of these three types of reasoning did you apply to formulate and justify a hypothesis of Adams’s alleged crime? Was your reasoning based on the presence of the serial number; in real life would you have verified that the particular camera stamped each photo with its serial number? Did you know that EXIF information is as easily modified as Word document properties (or any other unsigned digital data)?

Forensics is a science when inductive reasoning is used. Inductive reasoning is strongest when hypotheses are repeatedly verified by independent parties. Investigators rely heavily on validation studies that perform repeated and precise tests on equipment and software to determine what can be said with assurance about evidence. Validation reports are published by government agencies, such as the National Institute of Standards and Technology (NIST), and by industry and academic researchers in peer-reviewed journals and conference proceedings.

Finally, we note that the most conservative view of inductive reasoning is that the investigator’s theory can only be negative – much like the traditional scientific method where we can at best reject the null hypothesis.

That is, we gain knowledge from only theories that are provably false. For example, we can be sure that the theory that Adams created the document at some third company is false if she never stepped foot in that company’s door. Any theory that Adams created the document at Acme is true only in the sense that we haven’t yet observed evidence that proves it false. This viewpoint is quite pessimistic! However, it is important to understand that more often than not, the complete facts are hidden from the investigator and although a hypothesis fits, it does not mean it is correct. That analysis is left for judge and jury and not the investigator.

Digital evidence is circumstantial

There are many advantages to digital evidence, but investigators, and courts, must realize there are many limitations. The primary limitation is that digital evidence is often circumstantial — it is indirect evidence of an event, and we can infer a fact from its presence. For example, in the Adams case, all our evidence was circumstantial. We did not use the content of the photo as direct evidence; we used information recorded in the EXIF tags to infer its origin. From a legal perspective, “direct” evidence is directly observable and speaks for itself – direct evidence proves a fact without inference. Examples of direct evidence include photos, video, recorded sound, DNA, and human witnesses to an event (including confession).

Digital evidence is often modifiable. In the Adams case, we assumed that her copy of MS Windows placed Acme’s information in the properties of the Word document; however, you should check for yourself that you are able to modify that evidence quite easily and save the document. The new document would have a new timestamp, but you can trivially get around this problem by resetting your computer’s clock, or less trivially by editing the file’s timestamp directly.

Many later homework assignments in this class will contain evidence generated by your instructor! While it was time consuming to create these assignments, it was not difficult, as the evidence is embedded automatically by programs in their normal course of operation. Regardless, it is important to realize that digital evidence should not be considered as absolute fact when it is found (as one might consider DNA evidence), but that does not make it weaker than other types of circumstantial evidence.

You might now be asking, is there any value at all to digital evidence then? The answer is yes.

First, most evidence at a crime scene is indirect evidence. For any crime for which there are not witnesses, the case must be circumstantial. Moreover, direct evidence from witnesses is not always reliable; people do not have perfect perception or memory and are often biased.

Second, indirect evidence can be strong if there is other corroborating evidence — a notion that Locard’s Exchange Principal speaks to. For example, let’s say that it has been alleged that John committed a crime against Jane, and John claims to not know her at all. Investigators find that John’s Web browser has a history of pages he has visited recently, including the text and images from those pages. Jane’s public Web page is found in that history cache, and it is used as indirect evidence that he knew Jane before the crime took place. John’s browser will record when exactly John last visited Jane’s page, and such facts can be corroborated by examining the Web server that hosts Jane’s Web page. Furthermore, as we will discuss later in the semester, other logs at John’s Internet Service Provider may be able to confirm indirectly that his computer was connected to the Internet at the time the page was viewed. Logs from John’s email server may indicate he checked or sent email at the time when the Web page was retrieved; if he admits to keeping his account and password secret from others, then the email server logs indicate it was he at the keyboard at the time.

Third, circumstantial evidence can lead to direct evidence. Other stored pages in John’s Web browser history may lead investigators to John’s friend, who may confirm directly that John knew Jane. Moreover, when presented with indirect evidence, suspects may be persuaded to confess to a crime. For criminal cases, a true confession is just as good or better than pursuing a guilty verdict at a trial.

Investigations

Investigation is the core mechanism of digital forensics, but not all investigations involve an alleged crime. Generally, the techniques and processes presented in this class are applicable to five types of investigations.

Criminal investigators have the most restrictions on their actions, and their results will come under the most scrutiny, whether as part of a prosecution or a defense. Civil investigations have a similarly high bar, although prosecutors in civil hearings must present only a preponderance of evidence to a judge or jury. Incident response investigations are typically carried out by the owners of the systems or data that was violated and are not in the context of the laws that bind the actions of criminal and civil investigations. Although precision is also the goal of an investigator, a best guess is often more acceptable in this context. Finally, intelligence gathering have few constraints, and the results inform a policy or strategy by an organization or government. Malicious activity that seeks to invade someone’s privacy may obey no constraints, and is thus a potent threat model.

Some notes on Python

A quick overview of some useful things to know about Python. (Most code examples below are from https://learnxinyminutes.com/docs/python3/ which is worth looking over in detail.)

You can start the python interpreter by typing python at the command line:

> python
Python 2.7.18 (default, Mar  8 2021, 13:02:45) 
[GCC 9.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

But notice on this machine, python defaults to Python2, not Python3. Press CTRL-D (or on Windows, CTRL-Z) to exit the interpreter, then if needed, use python3 or python3.5 to get the right version:

> python3
Python 3.9.9 (main, Nov 21 2021, 03:23:42) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

The interpreter runs in a REPL: a Read-Eval-Print Loop, which is excellent for exploratory programming. You type in an expression (it’s “read”), the interpreter “eval”uates it (and updates state if necessary), then the result of the expression is “print”ed, and the process starts again.

Some stuff you might find useful lives in the Python Standard Library, which is well documented here.

Here are some built-in functions:

print(), like, prints stuff. It can take more than one object and prints their string representation to a file (stdout by default). You can point it at other files with the file= argument; you can change the separator between objects with the sep= argument; you can change the end of line marker (by default: a newline) with the end= argument.

open() in a with block, which closes the file – note the mode parameter to open the file in binary! If you read() from a file open in binary mode, you get back a bytes object. You can index it like an array to get the nth byte, which will be returned as an int; or you can “slice” it just like any other sequence in Python. Byte objects when displayed in the REPL show the ASCII value if the character is printable, or \xHH if it’s not, where HH will be the hexadecimal value of the byte.

hex() returns the hex value (as a string) of a given int; might be useful when converting individual bytes to hex.

You need to learn Python string formatting. Most people use f-string formatting for python now. See tutorial/explanation here.

foo = "Brian"
grade = 8/9
"Student " + foo + " received a grade of " + round(grade,2)+ "."
# f-strings are in-line and more readable
>>> f"Student: {foo} received a grade of {round(grade,2)}."
'Student: Brian received a grade of 0.89.'
>>> f"Student: {foo} received a grade of {grade:2.2f}"
'Student: Brian received a grade of 0.89'
>>> f"Student: {foo} received a grade of {grade*100:.1f}%"
'Student: Brian received a grade of 88.9%'
>>> value = 10011 
>>> f"{value:#2x}"
'0x271b'
>>> f"{value:#2X}"
'0X271B'
>>> f"{value:#b}"
'0b10011100011011'
>>> value = 54
>>> print(f"The base 10 value is {value}. One more than that is {value+1}.")
The base 10 value is 54. One more than that is 55.
>>> print(f"The binary value is {value:b}, which is"  
"{value:08x} in hex with up to 8 leading zeros.")
The binary value is 110110, which is 00000036 in hex with up to 8 leading zeros.