Renting prices in Algeria (Data scraping, EDA, ML)
Youcef Belmokhtar
0
Data Scientist
Data Analyst
Product Data Analyst
Jupyter Notebook
pandas
scikit-learn
In [231]:
import matplotlib.pyplot as plt
import seaborn as sns
Renting price in Algeria
Today, I'm gonna make some analsyis on Renting price in Algeria (my country). I'm gonna scrap the data from an Algerian website, process it, make some analysis then try to build a machine learning model to predict the Final renting price.
Scapped data will be used for learning purpose only, no commercial use!
url='https://darjadida.com/annonces/immobilier?q=location+appartement&per_page='for z in range(1,100): html = requests.get(url+str(z)) soup= BeautifulSoup(html.content,'lxml') titles=soup.find_all('h4') locations=soup.find_all('a',class_="listing-address popup-gmaps") sizes=soup.find_all(class_="listing-details") agences=soup.find_all('div',class_="listing-footer") prices=soup.find_all('span',class_="listing-price") details=soup.find_all('ul',class_="listing-details") for i in range(len(details)): title.append(titles[i].text) location.append(locations[i].text) agence.append(agences[i].text) price.append(prices[i].text) detail.append(details[i].text)
Agencycount=pd.pivot_table(df,index='Agency',values='Price',aggfunc='count')Agencycount=Agencycount.sort_values(by='Price',ascending=False)Agencycount=Agencycount[1:].head(5)Agencycount.plot(kind='pie',figsize=(8,8),subplots=True,autopct='%1.1f%%',title='Most popular agencies',legend=False)
Agencyprice=pd.pivot_table(df,index='City',values='Price',aggfunc='count')Agencyprice=Agencyprice.sort_values(by='Price',ascending=False)Agencyprice=Agencyprice.head(5)Agencyprice.plot(kind='bar',figsize=(10,8),title='Top 5 cities per count',edgecolor='black')
Out[453]:
<AxesSubplot:title={'center':'Top 5 cities per count'}, xlabel='City'>
Bejaia has the highest available appartements.
In [489]:
Agencyprice=pd.pivot_table(df,index='City',values='M²',aggfunc='mean')Agencyprice=Agencyprice.sort_values(by='M²',ascending=False)Agencyprice=Agencyprice.head(5)Agencyprice.plot(kind='bar',figsize=(10,8),title='Top 5 cities per Area',edgecolor='black')
Out[489]:
<AxesSubplot:title={'center':'Top 5 cities per Area'}, xlabel='City'>
They are almost equal
In [465]:
ddd=df[['Bedrooms','Price']].groupby('Bedrooms').mean()ddd.plot(kind='line',color='b',title='Average Price per number of bedrooms')
Out[465]:
<AxesSubplot:title={'center':'Average Price per number of bedrooms'}, xlabel='Bedrooms'>
Price will increase with number of bedrooms
In [483]:
ddd1=df[['M²','Price']].groupby('M²').mean()ddd1.plot(color='b',title='Average Price per M²',figsize=(12,6))
Out[483]:
<AxesSubplot:title={'center':'Average Price per M²'}, xlabel='M²'>