티스토리 뷰

PyPI


- Python Package Index

- 파이썬 관련 패키지들의 Repository

- https://pypi.org/


# pip를 이용한 설치 방법

# 최신 버전

pip install 'SomeProject'

# 특정 버전

pip install 'SomeProject==1.4'

# 조건

pip install 'SomeProject>=1,<2'

pip install 'SomeProject~=1.4.2'

출처: https://packaging.python.org/tutorials/installing-packages/


주요 패키지


NumPy

- Numerical Python

Pandas

- Link

- 데이터 분석을 위해 R의 dataframe를 참조하여 만듬.

- Tutorial

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python

- Q&A와 함께 단계적으로 설명 

Python - Pandas 튜토리얼 1(데이터프레임 생성, 접근, 삭제, 수정)

- ndarray, dictionary, dataframe, series, list


# Take a 2D array as input to your DataFrame 

my_2darray = np.array([[1, 2, 3], [4, 5, 6]])

print(pd.DataFrame(my_2darray))


       0  1  2

    0  1  2  3

    1  4  5  6


# Take a dictionary as input to your DataFrame 

my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}

print(pd.DataFrame(my_dict))


       1  2  3

    0  1  1  2

    1  3  2  4


# Take a DataFrame as input to your DataFrame 

my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])

print(pd.DataFrame(my_df))


       A

    0  4

    1  5

    2  6

    3  7


# Take a Series as input to your DataFrame

my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United States":"Washington", "Belgium":"Brussels"})

print(pd.DataFrame(my_series))


                             0

    Belgium           Brussels

    India            New Delhi

    United Kingdom      London

    United States   Washington


df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))


# Use the `shape` property

print(df.shape)


(2, 3)


# Or use the `len()` function with the `index` property

print(len(df.index))


    2


Matplotlib

- Link

- 2차원 데이터 시각화 라이브러리



IPython

- 데이터 처리 및 시각화에 유용

- 파이썬 쉘 제공(테스트 및 디버깅)



추가 주요 패키지

 명

 내용 

 SciPy

 Python-based ecosystem of open-source software for mathematics, science, and engineering.

 SymPy

 SymPy is a Python library for symbolic mathematics.

 StatsModels

 Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

 Scikit-Learn

 Machine Learning in Python

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

 TensorFlow

 TensorFlow™ is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.

 Keras

 Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlowCNTK, or Theano.



Anaconda

- Anaconda, Miniconda, Anaconda Server(유료) 로 구분
- 데이터 분석에 필요한 패키지를 포함
- IPython, Matplotlib, NumPy, NLTK(자연어처리), Pandas, Pillow(이미지분석), Jupyter, ...

Anaconda Distribution Diagram




댓글