Home Forums Main Forums Python Forum How to use Python to extract table data from websites?

  • How to use Python to extract table data from websites?

     Wong updated 3 years, 6 months ago 2 Members · 3 Posts
  • Justin

    Administrator
    October 25, 2020 at 11:37 am
    Up
    0
    Down

    I want to extract the data in tables at the BC_Immigration website:

    https://www.welcomebc.ca/Immigrate-to-B-C/B-C-Provincial-Nominee-Program/Invitations-to-Apply

    How many different methods we can use ? What is the best one?

  • Justin

    Administrator
    October 25, 2020 at 11:54 am
    Up
    0
    Down

    One way is to use pandas package, it has the read_html() function and read HTML tables into a list of data frames. It is very simple and convenient. Any other method?

    import requests
    url = 'https://www.welcomebc.ca/Immigrate-to-B-C/B-C-Provincial-Nominee-Program/Invitations-to-Apply'
    html = requests.get(url).content
    html
    tables = pd.read_html(html)
    tables
    df = tables[1]
    df
    df.to_excel('/kaggle/working/skilled.xlsx', sheet_name='skilled', index = False)

    Also, I want to extract the table names from the web content:

    Table 1: Skills Immigration and Express Entry BC

    Table 2: Entrepreneur Immigration

    How to grab them and assign them to each list element?

    • Wong

      Member
      October 25, 2020 at 9:34 pm
      Up
      0
      Down

      <div>The table name string can be found here:

      <h3>Entrepreneur Immigration</h3>\r\n\r\n<table align="center"

      </div>

      We can use ‘table align’ as key word to look for it.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now