As of late, I noticed a problem that just appeared and it wouldn’t go away. It just started showing up and causing tons of 404s on my site. Now that’s not good. The problem was simply the following:
I would have a link on my site. Let’s say the real URL is:
http://swimminginthought.com/estore
For some odd reason, I’m getting referrals from google linking to: http://swimminginthought.com/?/estore
Needless to say, this was causing me tons of grief and thousands of 404 errors. (Yes, I actually monitor these things as any good webmaster / technologist should).
I realized that .htaccess and mod_rewrite would be the best way to solve it, but doing so turned into a 20+ man-hour battle over 2 weeks. Why? The lack of help on the Internet.
Yes, there are tons of postings on mod_rewrite, but when you have a metacharacter in your url and mod_rewrite makes it near impossible to solve, because nobody seemed to have documented the problem and how to fix it, it was a very messy situation.
I did the research (and extensively sharpened my Reg-Ex skills just so you know). I knew Reg-Ex, but I always muddled through it. Not that it’s hard, but I rarely used it.
I went to the Apache documentation and even jumped on the IRC channel for apache support. I’m not going to name names, but it was one of the most rude and insulting experiences of my life. The support was non-existent. Just continuous grief from an individual who felt he needed to power trip. I could name him, but I’m not one to trash anybody. Like my mama said, “If you have nothing nice to say, don’t say it”. So I choose not to name the person.
Sorry for the tangent, but still chaps my knickers 24 hours later. Volunteer channel or not, you don’t treat people like they’re sub-human. I don’t blame him. I blame his parents for raising him that way.
So getting back to the story, I used a wonderful tool which is an .htaccess emulator. While it’s not complete, it does get you 90% of the way. So here’s the answer.
Just put the code below in your .htaccess file at the top. It basically takes the /?/ out of the url and puts your url back to: http://swimminginthought.com/estore (in my case domain/post-name)
or whatever the post-name or page-name is. Obviously, the %1 is a variable grabbed from the query_string in the RewriteCond statement. So it can be %1,%2,%3, etc.. however you want to format it. This is where the RewriteRule comes in.
Simply, it takes the variables from the RewriteCond statement previous to it and allows you to name them in the RewriteRule. The “?” at the end of the %1 variable basically tells apache not to Append the Query String to the statement.
As for the bracketed statements, [NC] means ignore case and [R=301,L] mean Redirect Permanent (301 code) and Last Rule in the chain respectively.
# This strips the /?/ from the url
RewriteBase /
RewriteCond %{QUERY_STRING} ^/(.*)$ [NC]
RewriteRule ^(.*) %1? [R=301,L]
I know the problem wasn’t mine considering I had examined my sitemaps thoroughly and the sitemaps were not only validated by multiple validators, the hyperlinks to the respective posts were correct. The only errors I was getting was intermittent and it was from Google only. So I’d have to say the problem was Google’s considering I’ve been hearing they’ve been having problems lately with this sort of thing.
I sincerely hope this helps more than one of you out there considering I couldn’t find any of this documented. If this helps, please consider donating. Every bit counts these days.