What is Reddit's opinion of

""




Categories:

Check price

1 comment of this product found across Reddit:
sarrysyst /r/learnpython
2 points
1970-01-19 17:20:55.511 +0000 UTC

Regarding the links, there is a lot of unnecessary information stored in the URLs. Essentially, you really only need the product ID, which is this part:

https://www.amazon.com/dp/B06ZZ1MKW8/?colid...
--> B06ZZ1MKW8 is the product ID

You can simply store the base URL ('https://www.amazon.com/dp/') in a constant and the IDs separately in a list, dictionary etc. concatenating them to get the product URL. Bringing me to the next part.

Storing your data between sessions. There are different options, most common are JSON, a Pickle or a database. JSON is a string based format, similar to a nested list/dictionary which is stored in text form. A pickle is a way of storing your python objects (eg. dictionaries) as a binary file. Both JSON and pickle can be loaded/saved at the beginning/end of your script. Finally, a database for which sqlite3 is probably the most basic and easiest variant.

Since you want to write some kind of tracker in future you probably want to go for a database. Like I said, the easiest of which is sqlite3. It's also part of the python standard library. There are also different options here: hosting your script or scheduling (eg. cron jobs). The latter is probably the easier, since it doesn't involve any web frameworks.

I'm not really sure what you mean with the adding to wish list. If the page is dynamically loaded, you can on the one hand check the network tab in the developer tools of your browser and see if you can work something out or use a web driver like selenium or a library requests-html. By the way, if you want to crawl a larger amount of pages, a web scraping framework like scrapy is better suited for the job than an HTML parser like BeautifulSoup.