forensics.media Subscribe
Overview

What can forensics learn from a file?

By The Forensics Media team
7 min read
Contents

A media file can reveal a surprising amount: often when and where it was made, sometimes the device that made it, and whether it has been edited or simply recycled from another context. What media forensics will not do is prove any of this outright. Every method reports how strongly the evidence supports a question, every answer depends on whether the file still carries the relevant trace, and the most reliable readings come from original files where several independent methods agree. This guide maps what is knowable, method by method, and how far to trust each answer.

The questionWhat a file can revealHow far to trust itGo deeper
What made it?Camera or device model from metadata; the exact camera from sensor noiseMetadata is fakeable; sensor matching is strong but needs the device and a clean fileWhat camera took a photo
When and where?Timestamps and GPS; for audio, the mains hum (ENF) can date and place a recordingMetadata trivially faked; physical traces strong when present, often absent(audio deep-dive coming)
Has it been edited?Compression mismatches, cloned regions, noise and lighting breaksEach a lead, not proof; resaving erases most of itIs ELA reliable?
Is it in its real context?Earlier copies found by reverse search; geolocationMisses cropped or uncatalogued imagesVerify if an image is real

Media forensics answers questions, it does not prove them

The single most important thing to understand is that no media-forensic method outputs “real” or “fake.” Each one weighs whether a specific property of the file is consistent with an untouched original. Media forensics inherits this discipline from forensic science as a whole, whose reporting standards require findings to be expressed as a strength of support for a proposition rather than as a verdict, the evaluative framework set out by the European Network of Forensic Science Institutes (ENFSI, 2015). That support is even placed on a graded ordinal scale of evidential strength, from no support at all up to the strongest level (Nordgaard, Ansell, Drotz and Jaeger, 2012). A finding reads as “this strongly supports that the photo was edited,” never “this proves it,” because almost every trace has an innocent explanation as well as a guilty one.

What made it, and when and where

The first questions an examiner asks are about origin. For a photo, the metadata names the camera and may carry a timestamp and GPS coordinates, though every field is editable and the whole block strips on a re-save (Can EXIF data be faked?). The far harder trace to fake is sensor noise: the method introduced by Lukáš, Fridrich and Goljan (2006), and extended into an integrity test by Chen, Fridrich, Goljan and Lukáš (2008), ties an image to one physical camera by its Photo-Response Non-Uniformity (PRNU) pattern, with a false-reject rate under 1 percent at a false-accept rate of one in a thousand on clean files. It needs the candidate camera to compare against, and it decays fast once a file is processed. For audio the richest origin-and-time signal is the Electrical Network Frequency, the faint mains hum a recording picks up from the power grid: matched against a continuous frequency archive it can date a clip and place it to a grid region. Cooper (2009) matched a 70-minute Glasgow recording to a London archive 676 km away, and a 2-minute extract across a 36-day archive. The pattern across media is constant: metadata answers fast but lies easily, while physical traces are hard to fake yet often simply absent.

Has it been edited, or taken out of context?

Two very different questions hide inside “is this fake?” The first is whether the pixels or samples were altered, which an examiner approaches by fusing several weak signals rather than trusting one. In the Forensics Media team’s review of the major image-forensics toolkits, no single filter is offered as a verdict: Error Level Analysis, clone detection, double-JPEG analysis and noise maps each catch a different kind of edit and each false-alarms on its own. Even the strongest modern detectors are imperfect and condition-dependent. The CNN camera-model fingerprint Noiseprint was the best single method in its own nine-dataset benchmark yet still averaged a Matthews correlation of only 0.403, and its authors caution that its best splicing score came on “a simple dataset, with large splicings and uncompressed images” (Cozzolino and Verdoliva, 2020). TruFor, a state-of-the-art forgery localizer, reports an average F1 of 0.696 and ships a built-in reliability map marking where its own output is unsafe to trust (Guillaro et al., 2023). Attribution itself can be actively fooled: the SpoC attack trains a network to inject a target camera’s fingerprint into a synthetic image, defeating camera attribution (Cozzolino et al., 2021). The second question, whether an unaltered file is being shown in a false context, no pixel test can answer; that is an investigation of where the file came from (How to verify if an image is real).

What has happened to the file since

A file also carries the marks of its own history. Re-encoding leaves traces of how many times an image was compressed and which codecs or platforms it passed through, so a forensic read can often tell that a photo came through a particular app, or that audio was re-encoded after an edit. The limit is depth: this history is generally recoverable only about one layer back, because the last encoder overwrites the evidence of the earlier ones. It is useful for spotting that a file is not the pristine original it claims to be, far less so for reconstructing its full life story.

The rule behind every answer

One principle decides how much any of this is worth: traces are fragile, and ordinary handling destroys them. Resaving, recompression, screenshotting and platform re-encoding strip the faint detail most methods depend on, and a single mismatched processing chain alone can drop a PRNU sensor match by about 62 percent (Joshi et al., 2020). A heavily shared file has often lost the very evidence forensics would read, which is why a clean result on such a file is meaningless rather than reassuring. A finding earns trust only when it comes from an original file and several independent methods agree, the standard set out in full in How reliable is photo forensics?. Used that way, media forensics is a genuinely powerful way to raise or lower confidence with evidence; used as a single button that declares a file fake, it promises far more than the science can deliver.

Sources

  • European Network of Forensic Science Institutes (2015). ENFSI Guideline for Evaluative Reporting in Forensic Science (STEOFRAE).
  • Nordgaard, Ansell, Drotz, Jaeger (2012). Scale of conclusions for the value of evidence. Law, Probability and Risk 11(1):1-24. DOI: 10.1093/lpr/mgr020
  • Lukáš, Fridrich, Goljan (2006). Digital Camera Identification from Sensor Pattern Noise. IEEE Transactions on Information Forensics and Security 1(2):205-214. DOI: 10.1109/TIFS.2006.873602
  • Chen, Fridrich, Goljan, Lukáš (2008). Determining Image Origin and Integrity Using Sensor Noise. IEEE Transactions on Information Forensics and Security 3(1):74-90. DOI: 10.1109/TIFS.2007.916285
  • Cozzolino, Verdoliva (2020). Noiseprint: A CNN-Based Camera Model Fingerprint. IEEE Transactions on Information Forensics and Security 15:144-159. DOI: 10.1109/TIFS.2019.2916364
  • Guillaro, Cozzolino, Sud, Dufour, Verdoliva (2023). TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization. CVPR 2023. DOI: 10.1109/CVPR52729.2023.01974
  • Cozzolino, Thies, Rössler, Nießner, Verdoliva (2021). SpoC: Spoofing Camera Fingerprints. CVPR Workshops 2021. DOI: 10.1109/CVPRW53098.2021.00110
  • Joshi, Korus, Khanna, Memon (2020). Empirical Evaluation of PRNU Fingerprint Variation for Mismatched Imaging Pipelines. IEEE International Workshop on Information Forensics and Security (WIFS) 2020. DOI: 10.1109/WIFS49906.2020.9360911
  • Cooper, A. J. (2009). The Electric Network Frequency (ENF) as an Aid to Authenticating Forensic Digital Audio Recordings: An Automated Approach. AES 33rd International Conference on Audio Forensics.
#forensics#image#audio#reliability