IT/가짜연구소 스터디

[DA] 4-3. Importing Data from Databases

Millennials 2022. 9. 13. 21:37

해당 내용은 Datacamp의 Data engineering track을 정리했습니다.
4. Streamlined Data Ingestion with pandas의 chapter 1에 대한 내용입니다.

1. Introduction to databases

관계형 데이터베이스는 테이블(table)들에 행(rows)와 열(columns)로 이루어져 있습니다. 행들에 각 실제 값들이 들어가게 되고, column에 그 속성이 들어가게 됩니다.

관계형 데이터베이스는 고유한 식별자(unique keys)를 통해 테이블을 연결하거나 관리할 수 있다는 점에서 다른 종류의 데이터들 데이터프레임, excel 등과 다릅니다.

데이터베이스에서 데이터를 읽는 것은 2단계 프로세스를 거칩니다.

database에 연결(connect)합니다.
SQL 혹은 Pandas 언어로 query를 보내 가져옵니다.

관계형 데이터베이스와 작업할 수 있는 도구가 있는 SQLAlchemy 라이브러리를 사용하게 될 것입니다. 특히 SQLAlchemy 의 create_engine() 을 이용해서 데이터 베이스에 연결해서 핸들할 수 있는 엔진을 만들 것입니다.

엔진은 데이터 베이스의 URL을 가져와 데이터베이스에 연결하도록 합니다. 이 URL 패턴은 데이터베이스 종류마다 조금씩 다를 수 있습니다.

pandas 에서는 pd.read_sql(query, engine) 을 사용합니다.
query : SQL 문
engine : 데이터베이스에 연결하는 engine

# Load libraries
import pandas as pd
from sqlalchemy import create_engine

# Create the database engine
engine = create_engine('sqlite:///data.db')

# Load hpd311calls without any SQL
hpd_calls = pd.read_sql("hpd311calls", engine)

# View the first few rows of data
print(hpd_calls.head())

# Create a SQL query to load the entire weather table
query = """
SELECT * 
  FROM weather;
"""

# Load weather with the SQL query
weather = pd.read_sql(query, engine)

# View the first few rows of data
print(weather.head())

# 두 쿼리의 결과는 같습니다.

2. Refining imports with SQL queries

테이블 열에서 고유한 값들을 가져오는 것은 SQL에서 DISTINCT을 사용할 수 있습니다.
SUM, AVG, MAX, MIN () 괄호 안에 하나의 목표하는 COLUMN 이름이 들어옵니다.
COUNT(*)

    # Create query for unique combinations of borough and complaint_type
    # borough와 complaint_type 각각 유니크 한 값이 아니라 쌍이 unique한 값이 불러와진다.
    query = """
    SELECT DISTINCT borough, 
         complaint_type
    from hpd311calls;
    """


# Load results of query to a dataframe

issues\_and\_boros = pd.read\_sql(query, engine)

# Check assumption about issues and boroughs

print(issues\_and\_boros)

3. Loading multiple tables with joins

SQL은 테이블끼리 연결된 테이블 값이 있을 경우, 해당 테이블들을 join시키는 것이 가능합니다.


# Query to get heat/hot water call counts by created\_date
'''
query = """  
SELECT hpd311calls.created\_date,  
COUNT(\*)  
FROM hpd311calls  
WHERE hpd311calls.complaint\_type = "HEAT/HOT WATER"  
GROUP BY hpd311calls.created\_date;  
"""

# Query database and save results as df

df = pd.read\_sql(query, engine)

# View first 5 records

print(df.head())

저작자표시

'IT > 가짜연구소 스터디' 카테고리의 다른 글

[DA] 4-4. Importing JSON Data and Working with APIs (0)	2022.09.13
[DA]4-2. Importing Data From Excel Files (0)	2022.09.13
[DA] 4-1. Importing Data from Flat Files (0)	2022.09.12
[DA] 3-4. Case Study: DataCamp (0)	2022.09.12

현재글[DA] 4-3. Importing Data from Databases

코딩, AI 데이터 사이언스 공부하는 흔한 밀레니언 세대 yy001003@gmail.com

20대 향수, 부수입, 부업, 타입향수, 더샘 향수, 우드세이지, 인터넷 돈벌기, 카피향수, 가성비 좋은 보조배터리, 쿠팡 파트너스, 빈티지 워터, 저가형 모니터, 연천 가볼만한 곳, 중소기업 모니터, 쿠팡, 건대 양식, 최신 모니터, 가성비충전기, 웹페이지 호스팅, 더샘 어반브리즈 빈티지워터,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

흔한밀레니얼의 사람이야기

[DA] 4-3. Importing Data from Databases

1. Introduction to databases

2. Refining imports with SQL queries

3. Loading multiple tables with joins

'IT > 가짜연구소 스터디' 카테고리의 다른 글

'IT/가짜연구소 스터디'의 다른글

티스토리툴바

[DA] 4-3. Importing Data from Databases

1. Introduction to databases

2. Refining imports with SQL queries

3. Loading multiple tables with joins

'IT > 가짜연구소 스터디' 카테고리의 다른 글

'IT/가짜연구소 스터디'의 다른글

관련글

티스토리툴바