top of page
  • Writer's pictureKeidra Navaroli

Week 12 Notes on Data Scraping Tools

Reflection and Application

The majority of the tools covered in Week 12 are outside of my immediate area of expertise. As such I initially struggled with their relevance. However, after spending some time exploring their respective capabilities (and revisiting the informative lectures presented by my classmates), I was able to gain a greater appreciation for their use within my own research. These data scraping tools will allow me to determine trends in social media as they occur, providing a valuable data set for future explorations. In future iterations of this class, it would also be beneficial to explore Orange software.


TAGS – (Twitter Archiving Google Sheet) allows one to search Twitter for specific terms or hashtags

(Note: must have a Google/Gmail account and Twitter to run.)

  1. copy

  2. On the Readme/Settings sheetenter the following settings (starting in cell B9):

    • Who are you = any web address that identifies you or your event

    • Search term = what you are looking for eg #cetis12

    • Period = default

    • No. results = 1500 (this is the maximum twitter allows but without authenticated access you might get less. See the Advanced setup for info on configuration)

    • Continuous/paged = continuous

  3. To configure the spreadsheet to automatically update select Tools > Script Editor … and then in the Script Editor window select Triggers > Current script’s triggers… and Add a new trigger. Select to run ‘collectTweets’ as a ‘Time-driven’ choosing a time period that suits your search (For unauthenticated access I collect 1500 tweets every hour). Click ‘Save’

  4. The collection can manually be trigger by TAGS > Run Now! (Results appear on the ‘Archive’ Sheet).


See additional resources from Martin Hawksey:


Voyant – open source tool for text analysis. Helpful for examining how words may be grouped together to form meaning. (i.e. "Hip hop")


A complete tutorial guide can be access here: https://voyant-tools.org/docs/#!/guide/tutorial

There are three main ways of selecting a corpus in Voyant Tools:

1. type or paste into the main text area, either normal text or a set URLs, one per line; then hit the "Reveal" button

2. open an existing corpus (such as Austen or Shakespeare

3. upload one or more files from your computer

(The upload file selector should allow you to choose one or more files using Ctrl and Shift keys. If you have several documents to add at once, it may be easiest to first create a zip archive containing the files and then upload the one zip file.)

Once you select a corpus you will be presented with the default skin (the default configuration of tools) that includes the following tools:


Bookmarking Your Corpus

One of the most interesting features of Voyant Tools is the ability to bookmark and share URLs that refer to your collection of texts. Among other advantages, this allows you to work with the same texts during different sessions, without having to reload all the documents each time. You can export a link for your corpus and the current set of tools clicking on the “Export” (diskette) icon in the blue bar at the top, or export a link for an individual tool by clicking on the “Export” icon in one of the tool panes.

A corpus will remain accessible as long as it accessed at least once a month.


Note about Searches: It's possible to search for a single word, but more advanced searches are also supported with a special syntax (hovering over the question mark in the search box shows examples of the syntax). Try the search terms (in bold) in the list below (you can remove a query by hitting x in the box surrounding the query, or hitting backspace to delete it). Also notice that Voyant tries to suggest search terms as you type, you can click on a suggestion to add it to the queries.


Voyant doesn't support directly notions like singular and plural forms, but that you can determine what forms are present ("^dog*") and then decide if you want to combine forms ("dog|dogs") or keep them separate ("dog,dogs"). That helps for individual queries but of course doesn't help much when you would want to see all singular and plural forms combined in a frequency list.


AntConc – another program for analyzing electronic texts in order to find and reveal patterns in language. NOTE: Files must be loaded as plain text.


See YouTube explanation by classmate PS Berge: https://www.youtube.com/watch?v=c3KUqUnOY_E


Steps for each text type:

Loading a corpus

  1. Start AntConc

  2. Choose from the ‘File’ menu ‘Open File(s)’ (navigation help)

  3. Select desired file(s)

  4. Click ‘Open’ button

Word List

  1. Click on the ‘Word List’ tab

  2. Click ‘Start’ button

Concordance

1. Click on the word from the list in the ‘Word List’ tab

2. Click on the ‘Concordance’ tab

3. Enter word into the search box

4. Click ‘Start’ button

Collocates

  1. Click on the ‘Collocates’ tab

  2. Choose the Window Span for your search

  3. Enter word into the search box

  4. Click ‘Start’ button

Clusters

  1. Click on the ‘Clusters/N-Grams’ tab

  2. Choose the Cluster Size for your search

  3. Enter word into the search box

  4. Click ‘Start’ button

2 views

Recent Posts

See All

Week 11 Notes and Reflection on Digital Humanities

Citation: Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Jeffrey, S. (2012). Digital_Humanities. MIT Press. Reflection and Application: Chapter 3 examines DH in light of its impact on society

Week 11 Notes and Reflection on Data Feminism

Citation: D’Ignazio, C., & Klein, L. (2020). Data Feminism. MIT Press. Chapters 5 – Conclusion: Notes and Key Terms Reflections and Application The concluding chapters of Data Feminism provide cases a

Week 10 Notes and Reflection on Digital Humanities

Citation: Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Jeffrey, S. (2012). Digital_Humanities. MIT Press. Reflection and Application: The selected chapters of this publication are focused o

bottom of page