Several pages with "-1" appearing in my SEO spider crawls, and a question regarding XML sitemaps

Ryan R asked on April 8, 2015 18:48

Working on my sitemap and using Screaming Frog SEO to do my crawl.

I am getting several pages appended with "-1" such as:

example.com/sales
example.com/sales-1

Of course I want the main page, but do not want the page called "-1". When I type in that URL I do get a seemingly legitimate page, but missing part of the template. I suspect this has to do with some aliasing, but am not sure. My site with about 50 pages has 367 lines on my crawl. I need to clear out all those extra items, and I can not figure out how to purge Kentico of all those old objects/URL's.

My plan is to crawl the site, then remove all the items that I do not want on my XML sitemap.

When I upload that to Google, will Google respect my XML sitemap over a normal Googlebot crawl?

In other words, I want Google to see the pages that I choose on my sitemap, I do not want them to do their own crawl. Will they just crawl whats on my XML, and ignore all the -1's and remnant aliases?

Recent Answers


Brenden Kehren answered on April 8, 2015 18:51

Are you using the Google Sitemap webpart on a specific page or are you just using the googlesitemap.xml at the root of the site? If the latter, I'd recommend using the webpart because you can specify which page types you want to include (which I'm guessing the "sales-1" is a different page type than "sales") and would resolve this.

Another solution is to go through each of those pages and in the Navigation tab, uncheck the show in sitemap box.

0 votesVote for this answer Mark as a Correct answer

Ryan R answered on April 8, 2015 19:07

I am only using a sitemap web part on my actual sitemap page (example.com/sitemap). I am using that page for humans to utilize. I have also gone in and only checked "show in sitemap" for the pages I want in that tree. When I view that sitemap page it looks perfect, it only includes the exact items I want.

For the robots I am just placing sitemap.xml in the root. To generate that file I am using Screaming Frog (set to Googlebot) and this is where I am getting tons of returns. So my plan is to do my crawl, delete all the extraneous results, then export that as sitemap.xml, and place in the root and/or upload in Google Webmaster.

This is not in production yet so I really don't know how it's going to work.

0 votesVote for this answer Mark as a Correct answer

Brenden Kehren answered on April 8, 2015 20:16

Check out the Google Sitemap webpart, not just the Sitemap webpart that displays a sitemap. You need the auto generated XML. This can be done very simple with the Google Sitemap webpart. There should be no reason to manually generate your sitemap xml, just configure the webpart and set your page properties.

1 votesVote for this answer Mark as a Correct answer

Ryan R answered on April 8, 2015 20:30

Ok cool, I am going to give that a shot.

As always, thanks for your help!

0 votesVote for this answer Mark as a Correct answer

Ryan R answered on April 8, 2015 20:55 (last edited on April 8, 2015 20:57)

I got the sitemap running as you described (followed the K8 docs) and that certainly gives me a much cleaner sitemap.

There is another related issue though. We are using jQuery tabs on several pages, and want to treat some of those tabs as their own page. We have some custom code that lets us navigate to a specific tab via URL. For example I have example.com/sales. There are 3 tabs there, and I can navigate to them by using something like: example.com/sales (goes to tab1), example.com/sales/cars (goes to tab2), example.com/sales/trucks (tab 3), which will spit you out on the right tab. This is working great.

Now I need my sitemap to reflect the fact that my page "sales" is actually 3 pages. In other words I need my site map to reflect:

example.com/sales
example.com/sales/cars
example.com/sales/trucks

and this is why I was attempting to work my own sitemap. Any ideas on how to do this with the set up you describe? If I could edit the resulting xml file and add my aliases then I should be in the ball park. Since the webpart sitemap is dynamic, and created on the fly, I don't know how to edit it.

Another thought is I could save the dynamic googlesitemap.xml, then edit, then just drop it in the root, in the "traditional" manner.

0 votesVote for this answer Mark as a Correct answer

Brenden Kehren answered on April 9, 2015 01:09

So are those jQuery tabs separate pages or just static HTML? If they aren't separate pages, then you'll have to make your own mods to the xml manually.

0 votesVote for this answer Mark as a Correct answer

Ryan R answered on April 9, 2015 17:45

There is a child page that has all the HTML for each tab (separated by div/ID) and then that content is pulled into the tabs.

They are not separate pages, but using the Kentico aliases, our custom code, and a customized sitemap.xml I am hoping to have them appear as separate pages to Googlebot (or at least each have their own URL).

0 votesVote for this answer Mark as a Correct answer

Brenden Kehren answered on April 9, 2015 19:37

Then why not create child pages for the tabs (with no content) and have them simply redirect to /parent#nametag? Wouldn't this work and still get you a dynamic sitemap?

/parent/child1 -> /parent#child1

1 votesVote for this answer Mark as a Correct answer

Ryan R answered on April 9, 2015 19:51

I see what you mean though with creating the "dummy" child pages with a redirect - that would get me the sitemap I need. Google would see a "real" URL, nothing with an ID at the end (which would be truncated in the SERP, as far as I know).

Another cool thing about our method is that you see just a clean URL in the address bar. Example.com/sales/trucks, etc. I like it with the extension hidden. Nice and semantic.

Until I go live and let the bots crawl the site I just don't know what I am going to get, but of course that is always the case with SEO isn't it?

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.