There are many articles about dealing with recovering space on your computer / hard drive / Storage and they all talk about the same things. This article isn’t going to so much focus on that, but simply focus on the not so obvious things you can do.
As a rule of thumb, there are always the usual suspects that are easy to do for recovering space:
1. Clear your temporary Internet files (Browser Caches)
2. Search for lost clusters (run scandisk)
3. Check and repair your registry.
4. Archive and backup data you haven’t accessed in over a year.
Let’s concentrate on the other and one most important thing you can do. Most people don’t realize it, but the reality is we keep multiple copies of the same file in multiple places on our computer. If you’re anything like me, good luck trying to find all of them.
It’s called deduplication. Not exactly an easy thing to do, because it’s incredibly time consuming. Many people don’t realize exactly how many copies of emails, documents, etc, they have on their hard drives and all they do is just upgrade to a bigger hard drive bringing the baggage along with them.
Well, there is a simple solution. It involves simply (or not so simply) iterating through every file on your hard drive and identifying the duplicates. This is a very painful process and can take weeks if done by hand. Even with my expertise, it still took 9 hours on a computer with 8 cores (i7), coupled with 16 GB of RAM equipped with an SSD drive. While the process was slow, manual, and painful, it did manage to free up over 150 GB of space on a 512 GB SSD. Was it worth it? I’d say yes.
So the question is how does one do this? It’s going to require a bit of work, but I’ll explain the concept to you and let you decide whether you’d like to undertake the process.
1. I first installed a SQL database on my computer. I actually booted into CentOS, mounted the filesystem and then ran my queries against it.
2. Create a DB table with a couple of fields. The most important one though is to actually save the absolute path and filenames in one field and the MD5 checksum we’ll be generating for each file.
3. Now, we’re going to iterate through the entire mounted filesystem and generate the MD5 checksums for every file on the filesystem. If you’d like to get creative (what I did), I also saved the last modified date of the file as well. (More on that in a moment).
4. After the iteration process, you’re going to have a ridiculously large dataset.
5. Sort the table data by MD5 checksum, then in descending order, the last modified date.
6. You should see many files that are exactly the same (MD5) checksummed. Just delete the ones that are the oldest and only keep the newest (by last modified date).
Once you’ve completed this process, you’re done. You now have only one copy of each file on your computer.
If I get enough donations, I’ll make this a Java Application and post it for free so it can run on any platform.
It wasn’t an easy or fun task, but going from 500+ GB of data down to 350-ish GB of data without losing anything is a pretty impressive way to go!
I can do this for you if you would like, but it would be purely on a consulting basis or you can just donate and when I get enough donations to actually cover the time to write the application, I’ll do it and have it out there for anyone to download and use for free.
That is a promise!
Hope this little ditty helps those that actually want to recover a significant amount of space.
The next step is to do this on my Pegasus Array which has over 5 TB of data. (Yay!!!)
I should also mention that the performance increase on your computer will go through the roof!!! Alot less I/O overhead dealing with stuff that you never needed anyway.
3 thoughts on “Recovering space on your hard drive”
Awesome issues here. I am very glad to look your post. Thanks so much and I’m taking a look ahead to contact you. Will you please drop me a e-mail?
Thanks. I’m glad you enjoy this blog. Spread the word!
A, B and C are the only ways I know to recover a deeetld file. Since you copied over the file, it makes it a little more difficult to recover it. There’s no robust undelete functionality built into Windows so if you can’t find the file in your recycle bin and recover it as follows:A. Recovering a file in your recycle bin:1. Double-click the Recycle Bin icon on your desktop to open the window displaying all files in the Recycle Bin.2. Highlight the file you wish to restore, right-click the file and select the Restore option. This will restore the file from the original location it was deeetld. or Highlight the files you wish to restore, right-click the file and select Cut. Next, explore to the location you wish to move the deeetld item to and click Paste. orDrag-and-drop the icons from the Recycle Bin to the folder you wish them to be stored at.B. Try to download freeware program and/or purchase a program. Below is a list of freeware file recovery programs that can be used freely to recover lost data.->PC Inspector File Recover->Restoration->Undelete Plus->FreeUndeleteIn addition to the above freeware programs there are also several companies who have created programs designed to recover your lost data. For example, PowerQuest makes the utility Drive Image that in some cases can be used to recover data from a hard drive.C. Utilize a service from a company that specializes in recovering lost dataUtilize the service of a local data recovery company or an out of state data recovery company. One word of caution is that these services can sometimes be very expensive. It is only recommended they be used if the data is extremely important. Below is a listing of a few major data recovery companies. Action Front Data RecoveryCBL Data Recovery Technologies Inc.Doctor ByteDriveSavers Data RecoveryLazarus Data RecoveryOntrackVirtual Data RecoveryStellar Data RecoverySorry, I couldn’t do more.