Every few months, a slow news day leads to somebody, somewhere, buying an old PC, hard drive, or flash memory card off ebay, and then writing a story about how they were able to restore all the files that the previous owner had tried to erase prior to selling.
If you want to sell hardware and you’re not sure how some people can recover data from supposedly-erased hard drives, this article is for you.
I’m going to use this diagram to explain the whole thing: It represents data stored on a PC filesystem, such as a hard drive or Flash memory such as you get in digital cameras. It’s hugely reduced in size (even a floppy disk would be more than 2000 times bigger than this!) to simplify the explanations, but it’s good enough to illustrate the principles:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
It’s currently a totally blank disk. Each zero represents one byte of data.
Now, no working disk drive looks like this, even when it’s empty. The first thing that a disk has is a partition table. Most Windows PCs only have one partition, very slightly smaller than the capacity of the hard drive. But you can have up to four partitions on a normal disk drive (or even more, depending on your operating system.)
So we partition our disk drive, and now the computer knows where it can store data:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a p a r t i t i o n 1 = b a – z z 0 0 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
We’ve defined a single partition that occupies the disk from the start of the second row (ba) to the end of the last row (zz). We can’t start storing data before ‘ba’, because hard drives devote a certain amount of space to partition tables, and in our case, it’s the whole of the first row.
Next, we need to format our partition – in Windows, that means either NTFS or FAT. Other OSes use other filesystems. We’re going to use an imaginary one to keep things simple. (To save space, I’m not going to show all the empty lines in the following diagrams)
a b c d e f g h i j k l m n o p q r s t u v w x y z
a p a r t i t i o n 1 = b a – z z 0 0 0 0 0 0 0 0 0 0
b f o r m a t = c a – z z 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Again, we’ve devoted a complete row, this time to information about our formatted partition. However, the remaining 24 lines of space are now ready for writing. With this particular filesystem, the first row of the partition tells the computer where the files’ contents are stored. We’re going to add a file “credit.txt”, a text file that holds our credit-card number.
a b c d e f g h i j k l m n o p q r s t u v w x y z
a p a r t i t i o n 1 = b a – z z 0 0 0 0 0 0 0 0 0 0
b f o r m a t = c a – z z 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c c r e d i t . t x t = d a – d s 0 0 0 0 0 0 0 0 0 0
d 1 2 3 4 – 3 2 1 2 – 3 4 5 6 – 5 4 3 2 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The file and its location are now added (in green), and the contents (in orange) clearly visible from a simple scan of the disk.
Now this is where the problems start. We want to sell this drive, so we need to delete our credit card details from it. We delete the file, and this, we think, will delete the credit card details.
Right. . ?
Wrong. This is our filesystem after we delete the file:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a p a r t i t i o n 1 = b a – z z 0 0 0 0 0 0 0 0 0 0
b f o r m a t = c a – z z 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 1 2 3 4 – 3 2 1 2 – 3 4 5 6 – 5 4 3 2 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The file’s entry has been removed from row ‘c’ – The computer is presented with what it thinks is a blank disk. But the contents of the file are left untouched: Only row ‘c’ has been altered. The file has been logically deleted, because to the computer, the disk appears empty. But it has not been physically deleted: It’s still there.
Perhaps, instead, we should have simply deleted the whole partition? Let’s see what this would have achieved:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b f o r m a t = c a – z z 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c c r e d i t . t x t = d a – d s 0 0 0 0 0 0 0 0 0 0
d 1 2 3 4 – 3 2 1 2 – 3 4 5 6 – 5 4 3 2 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Oh dear, this is even worse! The partition is gone, but all the information about the formatted filesystem and its contents are still there. It’s very easy, with the data we’ve got, to simply re-create the partition table and restore all files within it. This makes it even easier for our malicious buyer to grab our credit card details!
The problem, in a nutshell, is that deleting never actually deletes the information. At best, it removes references to the information while leaving the information itself untouched.
In order to delete a file safely, what we really need to do is get at the actual contents. At this point, my bias starts to show through, because I think Linux users are considerably better off than Windows users here: Linux usually comes with a tool that does this very thing. It’s called shred. If you’re a Windows user, either get hold of a Linux LiveCD such as Knoppix, or look up a Windows-specific secure deletion program on Google. I’m going to continue by talking about shred, but the principles are the same whatever you use.
Shred and its brethen simply over-write file contents with random data. As an example, let’s see what would happen if we shred credit.txt
a b c d e f g h i j k l m n o p q r s t u v w x y z
a p a r t i t i o n 1 = b a – z z 0 0 0 0 0 0 0 0 0 0
b f o r m a t = c a – z z 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c c r e d i t . t x t = d a – d s 0 0 0 0 0 0 0 0 0 0
d k 2 v @ ( j 5 Z £ ^ ! k a 8 * N 8 A , 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
That’s better! The file is still there, but the contents are of no use to anybody. Shred learned from the green row, ‘c’, that credit.txt’s data was located from ‘da’ to ‘ds’ and then wrote random data to that area of the disk. If we now delete the file as usual, we can be sure that this disk drive has no clue as to our credit card number.
But what if we had a file with our credit card details in it that we deleted several months ago? What if its contents are still there, somewhere?
The only way to make absolutely sure that no recoverable data is left on the disk at all is to shred the whole thing. This does what we really wanted to do right at the start: Removes absolutely everything from the disk. Because Windows locks the files that it is currently using, and all OSes tend to write to the disk from time to time, you can’t do this from within a normal OS. You need to use something that can function independantly: Knoppix is really handy at this point! Do, of course, bear in mind that what you’re doing here is permanently and irreversibly wiping a disk drive completely, so make sure you remove or at least unplug any drives that you don’t want wiped! Accidents do happen. . .
From within Knoppix, you would open up a terminal and use fdisk -l to tell you what disk drives it can detect. It should show you at least two: The CD you booted from, and the drive you want to wipe.
The naming system is a bit arcane if you’re used to Windows and “C:” and “D:” for the hard drive and CD-ROM, but it’s simple enough to follow. All hard drive names start with “/dev” which simply means “device” – all the PC’s hardware has a name beginning with “/dev”. Typically, a hard drive will be “hd” if it’s IDE, or “sd” if it’s SATA. It will also have a letter following it: The first hard drive will be “a”, the second “b”, and so on.
So if you have a simple IDE hard drive, it will be called “/dev/hda”. If you have a SATA drive with two partitions, the disk will be “/dev/sda” and the partitions will be “/dev/sda1” and “/dev/sda2”
Simple enough, once you get the hang of it.
So, if your hard drive is a standard IDE, it will be /dev/hda you want to erase, and you would issue the command shred /dev/hda and then go and find something else to do for a while, because this takes quite some time: There’s a lot of data to write. By default, shred will overwrite the whole drive 25 times! If you have a 100GB disk, that means writing 2500GB of data. To just do it once, you would type shred -n 1 /dev/hda, but bear in mind that this is less secure.
Eventually, shred will leave you with a filesystem that looks like this:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a n # Y v C n $ } I / . ` b 0 J r n v 9 8 N % I : 3 ?
b = Y ` K c E b x x f W S p y \ g L l $ C ? ) , 8 k o
c O ! w | \ 7 2 v A i O I p w 5 v O k 1 \ I ` s T u a
d N g h j t y – 2 n c k m r 1 ( W 1 r . i < M _ L ‘ +
e @ } G L ^ ^ f ( t S = ] i ( D q ! r E 5 = K _ y 0 7
f % _ Z a o g I 2 . K v u O h D q q , A ` 2 0 E ” g ?
g K | k g 6 A ” j % S ? Z v a p t Z l x z < r P 3 D v
h > # n ( A e D * < _ [ N e x 7 i r T c a z f R t _ 3
i 9 M i # / K m E Z & k M ; m | C b * – > , _ * f i d
j | ( \ i m c o 3 k H & 5 G ; Q + ] m M w M 0 ) J E ?
k u ! T M r c ; 7 ` w < F , M \ 9 } a q # C j 0 Z u <
l O I p A : , D H } \ q 5 O 9 x z : C t { b > O ` G ;
m m V [ M p ` U p @ i C v n ‘ , s P | t I U Y T , / n
n h # h n i a J I R y b S y 0 A I W r U C 4 o F # b X
o – E ^ \ Q [ l U I + # u v { Y ( U _ @ = o ) h J _ m
p ^ L n t J # A ; V . ] m ! ] c a _ { , ” l m X \ o e
q % 6 n c g H x G 2 ^ , T ` ” ” / 0 > U X 8 % . 3 / 5
r ] f H f r h M ! c j W = 3 | I k | 6 J | X K f 3 T ,
s Y A > U / 0 Z $ y . C n T + & L } K o M m h { | s x
t _ o p L ] y g > _ N B & H 4 ; Y 3 B – j T m F . F o
u Q ? / F C ! Z j 3 : t E 9 s a o } _ H ” \ : q ] W #
v z ; w j W 2 : B * o P Q ! % 6 ” 9 L m z I t r 8 _ +
w = l V { h n 9 I t Y A r f r L d V H C $ s g ! { s J
x L ] I r E + q b Q \ y B & Q 3 I # $ W b , y x V Y t
y f $ ^ ‘ c O } @ 5 B _ 5 \ w 0 N Q j ( b – I w & ( ?
z ^ . y \ ” 2 F x ` V s # H 5 ; t ! } ! y 5 y ? e w #
If you’d rather it was returned to the pristine block of zeros we started with, add -z to the command: shred -z /dev/hda and the final pass of shred will write zeros instead of random data. We thus end up with this:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
And your disk is now about as safe as it can be, short of placing it in solvent and leaving it there until it dissolves. In theory, the data can still be recovered after multiple random over-writes, but you’d need very expensive forensic equipment to manage it: Not something the average ebayer is likely to have.