There are many mathods to connect db in python. We introduce two packages sqlalchemy and psycopg2. We assume .env file in which connection settings are are defined.
2021-07-29 19:56:34,134 INFO sqlalchemy.engine.Engine select version()
2021-07-29 19:56:34,136 INFO sqlalchemy.engine.Engine [raw sql] {}
2021-07-29 19:56:34,165 INFO sqlalchemy.engine.Engine select current_schema()
2021-07-29 19:56:34,165 INFO sqlalchemy.engine.Engine [raw sql] {}
2021-07-29 19:56:34,206 INFO sqlalchemy.engine.Engine select id from companies limit 10
2021-07-29 19:56:34,208 INFO sqlalchemy.engine.Engine [raw sql] {}
id
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
psycopg2
Code
import psycopg2import pandas as pdimport timefrom sshtunnel import SSHTunnelForwarderdef queryRedshift(sql): conn = psycopg2.connect( host=os.environ['DWH_HOST'], port=os.environ['DWH_PORT'], dbname=os.environ['DWH_DATABASE'], user=os.environ['DWH_USERNAME'], password=os.environ['DWH_PASSWORD']) cur = conn.cursor() cur.execute(sql) result = cur.fetchall() colnames = [col.name for col in cur.description]# pandas.DataFrameで返す用の処理 new_result = [[one for one in one_result] for one_result in result] result = pd.DataFrame(new_result,columns=colnames) cur.close() conn.close()# 連続で叩くと凄くヤバいので1秒待つ time.sleep(1)return result
Code
queryRedshift(q)
id
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
Bigquery
It is known that the bq performance in python is depends on a connection method. https://medium.com/@davide.sarra/slow-bigquery-results-no-more-8aa4dde92613
Lets campare short and long time queris ## short time query
Code
query_short_time="""select id from {}.{}.companies""".format(os.environ['BQ_PROJECT_NAME'], os.environ['BQ_DATASET_NAME'])