These Steps described in numbers below need to be done, data needs to be analyzed and output back to script screen. Domain, competitor keywords, etc are asked for by script, then it does the steps and formats and returns values.
Online script, with forms to fill in all data fields, with built in debugger. The reports get databased, with reporters email, user / pass.
Utilities to scrape with url:
1.
http://juicystudio.com/services/readability.php#readingresults
(scrape, and return results, to CSV (table)
2. Take a screen capture of what website looks like with CSS, Javascript and Cookies Disabled.
3. Check for a Robots.txt file *(always in root of domain)*, pull content out of robots file, bring back data.
4. See if sitemap exists, if it does, bring back the data.
5. Generate 404 error, take screenshot of 404 page.
6. Using SEOmoz Api,
Google API call, number of pages your site is indexed in.
Yahoo Search, number of pages your site is indexed in.
BING.com, search number of pages your site is indexed in.
7. Screenshot of website as googlebot user agent
8. Site Crawler, that counts the number of pages on the site. (bring back data)
9. Make API call on yahoo, google, bing – how many pages they have indexed. (bring back data)
Put TImer in so tool does not get banned.
10. Calculate responses, table says
200
300
400
500
Says how many worked, how many were 300 report, how many were 400 report, how many were 500 report.
Try using this:
http://www.xml-sitemaps.com/standalone-google-sitemap-generator.html – Modify script to pull title tags and Meta-Tags. Save Data
11. After Scrape, did all pages show up on google, is the root url at the top.
12. For Brand Name, (filled out field (ie: company name without url) does it come up on the serp, if so, is it number one, is it on the 1st page?
13. Do a cache,URL get top cached pages in google, and bring back data.
14. google.com/webmasters (login and pass taken from user using tool for site)
- Scrape the following
-User might have more than one site listed, need to grab the right site (based on url)
- Scrape all data, extrapulate the following:
—Has googlebot successfully accessed your homepage (yes or no)
—Pages from your site are included in googles index (yes or no)
15. In same quesry extrapulate, the following.
Links to your site table.
16. Go to diagnostic table, grab the 3 images and tables:
-Pages crawled per day
-kbps downloaded per day
-time spent downloading a page.
Bring back data to report.
17. using scrape data from 10, find the following.
- Total Pages
- Pages with Unique Titles
- Pages with Duplicate Title Tags
18. Use copyscape API – see if there is duplicate content, if not, say no, if so, bring back the url’s with duplicate content.
Use 5 pages used for cache date check
19. Look at homepage, count how many internal links are on the page.
20. Ip Info, run whois report, bring back the following on the IP
- where IP is hosted (godday?? etc)
- What country are they in
- Where is the host located (state / country)
21. Use https://siteexplorer.search.yahoo.com/mysites see if we can use Yahoo API, to get the folliwing info:
– How many Backlinks are comming into site.
– How many are deeplinks (not the homepage)
22. Load Time Test:
- Use http://www.websiteoptimization.com/services/analyze/
- Grab the load time
- Does the site pass w3c standards (use w3c validator,download table, store, but if it has warnings, ignore, if it has errors, mark this as failed)
23. User enters top 3 keywords, for competitve analysis, under each keywords, enter 3 urls of competitors.
– For each competitor, grab date created (from whois)
– Use Alexa – get each alexa rating for each competitor and their URL.
Display like this:
Alex Traffic Rank Site Comparison
www.COMPANYURL.com/ 939,046
Competitor 1: www.chicagocriminallaw.com/ 3,170,142
Competitor 2: www.criminal-law-lawyer-source.com/ 537,304
Competitor 3: www.mjpetro.com/ 5,459,986
for those 3 competitors, plus main site, hit googletrends, quantcast, compete, alexa, grab that data and display like:
bring back report for each one, (take screenshot)
if no data, say no data. Alexa will be scraped because of tabs, and input into fields.
24. Use: http://www.seomoz.org/linkscape
there is an API somewhere,or run the basic report, on top 3 competitor, and site, bring back the table.
Answer the following:
- 5 most common anchor texts for each competitor, and the main site.
25. Take the top 3 competitors, and take the following:
Meta- Description, Meta- keywords, and title tag and bring the data back to a table.
26. Use semrush,
http://www.semrush.com/api.html
To do the following,
- Keyword
- Keyword Pos
- Average Volume
- Cost Per Click
- URL
- Adwords Traffic
-Adwords Traffic Price