Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Scraping Data From the Web Additional Scraping Tasks An Intelligent Spider

Cody Stephenson
Cody Stephenson
8,361 Points

I think the form is broken (needs updating on Treehouse's side).

I tried a bunch of troubleshooting before actually trying to submit the form manually. When I did that I get "This form isn't set up yet Do you own this website?

If so, please login and create a form. Then update your HTML or JavaScript with the new form endpoint. More information here."

Chris Freeman
Chris Freeman
Treehouse Moderator 68,454 Points

Where did you get stuck? I was able to use the code from the video:

# setting up my venv
$  mkvirtualenv scrapy_tth
$  pip install scrapy
$  mkdir scraping_data
$  cd scraping_data/
# TTH material
$  scrapy startproject tthscrape
$  cd tthscrape/
$  cd tthscrape/spiders
## create formSpider.py
$  cd ../../
$  scrapy crawl horseForm
formSpider.py
from scrapy.http import FormRequest
from scrapy.spiders import Spider


class FormSpider(Spider):

    name = 'horseForm'

    start_urls = ['https://treehouse-projects.github.io/horse-land/form.html']

    def parse(self, response):
        formdata = {'firstname': 'chris',
                    'lastname': 'freeman',
                    'jobtitle': 'student',
                    }
        return FormRequest.from_response(response, formnumber=0,
                                         formdata=formdata,
                                         callback=self.after_post)

    def after_post(self, response):
        print("\n**********\n\nForm processed\n\n**********\n\n")
        print(response)
Cody Stephenson
Cody Stephenson
8,361 Points

When I run scrapy crawl horseForm I get a massive output without any of the expected portions and I think the relevant error is

2021-06-29 15:03:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://treehouse-projects.github.io/horse-land/form.html> (referer: None)
2021-06-29 15:04:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://formspree.io/robots.txt> (referer: None)
2021-06-29 15:04:01 [scrapy.core.engine] DEBUG: Crawled (404) <POST https://formspree.io/content+scrapy@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2021-06-29 15:04:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://formspree.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2021-06-29 15:04:01 [scrapy.core.engine] INFO: Closing spider (finished)

but there are a few hundred lines more that I don't think are relevant. Is it expected that I would get the error when I enter information in the form manually?

2 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,454 Points

Hey Cody Stephenson, I have to agree that the form is broken (or at least the setup is incomplete at formspree.io). Using the form on the website https://treehouse-projects.github.io/horse-land/form directly yields "This form isn't set up yet". Sad Panda

I hacked the form.html page manually revert the change below to the form action shown in the github history:

-        <form action="https://formspree.io/ken.alger+scrapy@teamtreehouse.com" method="POST">
+        <form action="https://formspree.io/content+scrapy@teamtreehouse.com" method="POST">

src: https://github.com/treehouse-projects/horse-land/commit/d559009a92a8de48ba7f2a62483fbd38060324ce#diff-c684eb10223c1ee8f91719ff3ea8b5756b5841ceb1111df54a52281e4d9a4174

When using "ken.alger" vs "content" manually as the form action, formspree.io comes back a valid response

Tagging Ken Alger, Does the form "content+scrapy@teamtreehouse.com" need to be set up?

David Sampimon
David Sampimon
12,026 Points

Bump: just leaving a comment here that I am running into the same issue in May 2022.

2022-05-06 12:49:47 [scrapy.core.engine] DEBUG: Crawled (404) <POST https://formspree.io/content+scrapy@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2022-05-06 12:49:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://formspree.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2022-05-06 12:49:47 [scrapy.core.engine] INFO: Closing spider (finished)
2022-05-06 12:49:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
Chris Freeman
Chris Freeman
Treehouse Moderator 68,454 Points

It might help escalate the issue if you send a link to this forum page to help@teamtreehouse.com