Hi, I am having problems withy my sitemap not being generated properly. It is being generated but isn't in the correct xml format. Interestingly if you look towards the bottom of the sitemap you can see the CMS pages are being generated just fine in the proper format, but not any products. They are just one continuous line of text.
Has anybody ever encountered this before? I've attached the xml sheet.
Thanks Ben, it certainly is very odd, I'm no nearer finding a solution. For the time being I've created a sitemap manually. Was certainly a shock opening Webmaster tools to see that only 14 urls were showing instead of over 19,000! I've transfered a copy of the site on to a different server to rule out any server issue, and disabled all extensions from their respective XML files, all with no success! I'll leave this for another day I think, and carry on making sitemaps manually for now.
I'm really not sure what to suggest, but here's where I might start to make sure I was working with good data...since you mentioned expecting 19,000 urls or so.
The obvious moves are to clear all cache and re-index. Re-indexing can get caught up depending on how much needs to be indexed and your server resources, so you might want to try re-indexing from command line. You can find a couple tutorials on this just Googling for it.
Another thought, if you don't have URL Rewrites you've entered manually, you can Empty or "Truncate" that table in the database and then run that particular Re-Index process to start fresh. If there are any mismatched products, they can recreate themselves with each re-index causing that database table to grow huge. Don't completely drop the table, just empty it and then re-index your url rewrites...AND BACK IT UP FIRST!
This will at least let you know you're starting with fresh and good data. Then wait for the cron task to run and create that sitemap.
I did have that problem in 1.7 with that table getting bigger after every reindex. I searched for duplicate URL's, both category and product, but couldn't find any. In the end I ended up running a CRON job every day at 03:00 to truncate that table, and then another CRON job at 03:05 to reindex the products! A bit of a cowboy fix, but there was no obvious reason for that happening (duplicate URL's). That problem seems to have disappeared after upgrading to1.8 though. Unfortunately I'm not sure of the time scale, and whether the sitemap issue was pre or post 1.8 upgrade. I obviously checked the 1.8 upgrade before putting it live, but didn't for one minute think to check the sitemap generation. Largely the 1.8 upgrade was smooth, only an issue with a scriptaculous JS folder going missing, that was an easy fix though. I'll check again for duplicates though.
Those might be the same product, and with each re-index run, it could add another. Its a really weird issue and I haven't been able to track down why it happens, but I have seen it before.
When you did your upgrade, did you overwrite files or did you start with a fresh 18.104.22.168 fileset and pull in your template and media and other files? Just wondering if there is an old 1.7 things stuck there maybe???
I did the upgrade in my staging area. I copied the site to my staging sub domain, did the upgrade via magento connect, and it all went pretty smoothly. I had to reset all the files and directory permissions back to 644 & 755 via SSH, but I always seem to have to do that after I upgrade via magento connect for some reason...... Once I was happy it was all working I transferred it over to put the new version live.
I remember reading about the problem with duplicates when it was happening to me, but I'm certain that has stopped now with 22.214.171.124 as the rewrite table doesn't seem to get any larger after each re-index.