Twitter app8

When you want to read these tweets back into your Python environment, all you need to do is the following.

# and if we want to reuse these tweets at a later time we can reload them
old_tweets = pd.read_csv('/Users/brendan.tierney/Dropbox/tweets.csv')

old_tweets

Tweet app9

That's all very easy!

 

Saving Tweets to a Database

There are two ways to add tweets to table in the database. There is the slow way (row-by-row) or the fast way doing a bulk insert.

Before we get started with inserting data, lets get our database connection setup and the table to store the tweets for our date. To do this we need to use the cx_oracle python library. The following codes shows the setting up of the connections details (without my actual login details), establishes the connects and then retrieves some basic connection details to prove we are connected.

# import the Oracle Python library
import cx_Oracle

# define the login details
p_username = "..."
p_password = "..."
p_host = "..."
p_service = "..."
p_port = "1521"

# create the connection
con = cx_Oracle.connect(user=p_username, password=p_password, dsn=p_host+"/"+p_service+":"+p_port)
cur = con.cursor()

# print some details about the connection and the library
print("Database version:", con.version)
print("Oracle Python version:", cx_Oracle.version)

Database version: 12.1.0.1.0
Oracle Python version: 6.3.1

Now we can create a table based on the current date.

# drop the table if it already exists
#drop_table = "DROP TABLE TWEETS_" + cur_date
#cur.execute(drop_table)

cre_table = "CREATE TABLE TWEETS_" + cur_date + " (tweet_id number, screen_name varchar2(100), place varchar2(2000),
lang varchar2(20), date_created varchar2(40), fav_count number, retweet_count number, tweet_text varchar2(200))"

cur.execute(cre_table)

Now lets first start with the slow (row-by-row) approach. To do this we need to take our Panda data frame and convert it to lists that can be indexed individually.

lst_tweet_id = [item[0] for item in rows3]
lst_screen_name = [item[1] for item in rows3]
lst_lang =[item[3] for item in rows3]
lst_date_created = [item[4] for item in rows3]
lst_fav_count = [item[5] for item in rows3]
lst_retweet_count = [itememotion-12 for item in rows3]
lst_tweet_text = [item[7] for item in rows3]

#define a cursor to use for the the inserts
cur = con.cursor()
for i in range(len(rows3)):
#do the insert using the index. This can be very slow and should not be used on big data
cur3.execute("insert into TWEETS_2018_06_12 (tweet_id, screen_name, lang, date_created, fav_count, retweet_count,
tweet_text) values (:arg_1, :arg_2, :arg_3, :arg_4, :arg_5, :arg_6, :arg_7)",
{'arg_1':lst_tweet_idemotion-55, 'arg_2':lst_screen_nameemotion-55, 'arg_3':lst_langemotion-55, 'arg_4':lst_date_createdemotion-55,
'arg_5':lst_fav_countemotion-55, 'arg_6':lst_retweet_countemotion-55, 'arg_7':lst_tweet_textemotion-55})

#commit the records to the database and close the cursor
con.commit()
cur.close()

Tweet app10

Now let us look a quicker way of doing this.

WARNING: It depends on the version of the cx_oracle library you are using. You may encounter some errors relating to the use of floats, etc. You might need to play around with the different versions of the library until you get the one that works for you. Or these issues might be fixed in the most recent versions.

The first step is to convert the panda data frame into a list.

rows = [tuple(x) for x in tweets_pd.values]
rows

Tweet app11

Now we can do some cursor setup like setting the array size. This determines how many records are sent to the database in each batch. Better to have a larger number than a single digit number.

cur = con.cursor()
cur.bindarraysize = 100
cur2.executemany("insert into TWEETS_2018_06_12 (tweet_id, screen_name, place, lang, date_created, fav_count,
retweet_count, tweet_text) values (:1, :2, :3, :4, :5, :6, :7, :8)", rows)

Check out the other blog posts in this series of Twitter Analytics using Python.